Re: Deploy Solr to Production: guides, best practices

2017-10-19 Thread GW
Not a Windows user but you should be able to just install it and surf port
8983. Once installed it should show in services

https://www.norconex.com/how-to-run-solr5-as-a-service-on-windows/

On 19 October 2017 at 07:18, maximka19  wrote:

> Rick Leir-2 wrote
> > Maximka
> > The app server is bundled in Solr, so you do not install Tomcat or JEtty
> > separately.
> > Cheers -- Rick
>
> Hi! So, what should I do to host it in Windows Server as service? In
> production.
>
> Thanks
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>


Re: Fetch a binary field

2017-08-17 Thread GW
Had the same issue with long base64_encoded images. Binary & string failed.
Set my field type to field type ignored. Doesn't seem right (or wrong) but
it worked.

On 17 August 2017 at 03:58, Rick Leir  wrote:

> On 2017-08-12 04:19 AM, Barbet Alain wrote:
>
>> Hi !
>>
>> Because this field containt a zipped xml that is bigger than all
>> others fields & I don't need it for searching, just for display. Yes
>> it would be better if this field is outside the Lucene base, but as I
>> have hundred of bases like that, with millions of documents for each,
>> no I can't change this & reindex the stuff ...
>>
>> Any other idea ?
>>
> Alain,
> Since nobody else said it, after a long while...
> Your zipped xml could be opened before indexing. You should just index the
> data from the xml which will be needed for display. Hopefully that will
> consume less index space than the whole xml.
> cheers -- Rick
>


Re: MongoDb vs Solr

2017-08-05 Thread GW
Insults for Walter only.. sorry..

On 5 August 2017 at 06:28, GW <thegeofo...@gmail.com> wrote:

> For The Guardian, Solr is the new database | Lucidworks
> <https://www.google.ca/url?sa=t=j==s=web=2=rja=8=0ahUKEwiR1rn6_b_VAhVB7IMKHWGKBj4QFgguMAE=https%3A%2F%2Flucidworks.com%2F2010%2F04%2F29%2Ffor-the-guardian-solr-is-the-new-database%2F=AFQjCNE6CwwFRMvNhgzvEZu-Sryu_vtL8A>
> https://lucidworks.com/2010/04/29/for-the-guardian-solr-
> is-the-new-database/
> Apr 29, 2010 - For The Guardian, *Solr* is the new *database*. I blogged
> a few days ago about how open search source is disrupting the relationship
> between ...
>
> You are arrogant and probably lame as a programmer.
>
> All offense intended
>
> On 5 August 2017 at 06:23, GW <thegeofo...@gmail.com> wrote:
>
>> Watch their videos
>>
>> On 4 August 2017 at 23:26, Walter Underwood <wun...@wunderwood.org>
>> wrote:
>>
>>> MarkLogic can do many-to-many. I worked there six years ago. They use
>>> search engine index structure with generational updates, including segment
>>> level caches. With locking. Pretty good stuff.
>>>
>>> A many to many relationship is an intersection across posting lists,
>>> with transactions. Straightforward, but not easy to do it fast.
>>>
>>> The “Inside MarkLogic Server” paper does a good job of explaining the
>>> guts.
>>>
>>> Now, back to our regularly scheduled Solr presentations.
>>>
>>> wunder
>>> Walter Underwood
>>> wun...@wunderwood.org
>>> http://observer.wunderwood.org/  (my blog)
>>>
>>>
>>> > On Aug 4, 2017, at 8:13 PM, David Hastings <dhasti...@wshein.com>
>>> wrote:
>>> >
>>> > Also, id love to see an example of a many to many relationship in a
>>> nosql db as you described, since that's a rdbms concept. If it exists in a
>>> nosql environment I would like to learn how...
>>> >
>>> >> On Aug 4, 2017, at 10:56 PM, Dave <hastings.recurs...@gmail.com>
>>> wrote:
>>> >>
>>> >> Uhm. Dude are you drinking?
>>> >>
>>> >> 1. Lucidworks would never say that.
>>> >> 2. Maria is not a json +MySQL. Maria is a fork of the last open
>>> source version of MySQL before oracle bought them
>>> >> 3.walter is 100% correct. Solr is search. The only complex data
>>> structure it has is an array. Something like mongo can do arrays hashes
>>> arrays of hashes etc, it's actually json based. But it can't search well as
>>> a search engine can.
>>> >>
>>> >> There is no one tool. Use each for their own abilities.
>>> >>
>>> >>
>>> >>> On Aug 4, 2017, at 10:35 PM, GW <thegeofo...@gmail.com> wrote:
>>> >>>
>>> >>> The people @ Lucidworks would beg to disagree but I know exactly
>>> what you
>>> >>> are saying Walter.
>>> >>>
>>> >>> A simple flat file like a cardx is fine and dandy as a Solrcloud
>>> noSQL DB.
>>> >>> I like to express it as knowing when to fish and when to cut bait.
>>> As soon
>>> >>> as you are in the one - many or many - many world a real DB is a
>>> whole lot
>>> >>> more sensible.
>>> >>>
>>> >>> Augment your one-many|many-many NoSQL DB with a Solrcloud and you've
>>> got a
>>> >>> rocket. Maria (MySQL with JSON) has had text search for a long time
>>> but It
>>> >>> just does not compare to Solr. Put the two together and you've got
>>> some
>>> >>> serious magic.
>>> >>>
>>> >>> No offense intended, There's nothing wrong with being 97.5% correct.
>>> I wish
>>> >>> I could be 97.5% correct all the time. :-)
>>> >>>
>>> >>>
>>> >>>
>>> >>>> On 4 August 2017 at 18:41, Walter Underwood <wun...@wunderwood.org>
>>> wrote:
>>> >>>>
>>> >>>> Solr is NOT a database. If you need a database, don’t choose Solr.
>>> >>>>
>>> >>>> If you need both a database and search, choose MarkLogic.
>>> >>>>
>>> >>>> wunder
>>> >>>> Walter Underwood
>>> >>>> wun...@wunderwood.org
>>> >>>> http://observer.wunderwood.org/  (my blog)
>>> >>>>
>>> >>>>
>>> >>>>> On Aug 4, 2017, at 4:16 PM, Francesco Viscomi <fvisc...@gmail.com>
>>> >>>> wrote:
>>> >>>>>
>>> >>>>> Hi all,
>>> >>>>> why i have to choose solr if mongoDb is easier to learn and to use?
>>> >>>>> Both are NoSql database, is there a good reason to chose solr and
>>> not
>>> >>>>> mongoDb?
>>> >>>>>
>>> >>>>> thanks really much
>>> >>>>>
>>> >>>>> --
>>> >>>>> Ing. Viscomi Francesco
>>> >>>>
>>> >>>>
>>>
>>>
>>
>


Re: MongoDb vs Solr

2017-08-05 Thread GW
For The Guardian, Solr is the new database | Lucidworks
<https://www.google.ca/url?sa=t=j==s=web=2=rja=8=0ahUKEwiR1rn6_b_VAhVB7IMKHWGKBj4QFgguMAE=https%3A%2F%2Flucidworks.com%2F2010%2F04%2F29%2Ffor-the-guardian-solr-is-the-new-database%2F=AFQjCNE6CwwFRMvNhgzvEZu-Sryu_vtL8A>
https://lucidworks.com/2010/04/29/for-the-guardian-solr-is-the-new-database/
Apr 29, 2010 - For The Guardian, *Solr* is the new *database*. I blogged a
few days ago about how open search source is disrupting the relationship
between ...

You are arrogant and probably lame as a programmer.

All offense intended

On 5 August 2017 at 06:23, GW <thegeofo...@gmail.com> wrote:

> Watch their videos
>
> On 4 August 2017 at 23:26, Walter Underwood <wun...@wunderwood.org> wrote:
>
>> MarkLogic can do many-to-many. I worked there six years ago. They use
>> search engine index structure with generational updates, including segment
>> level caches. With locking. Pretty good stuff.
>>
>> A many to many relationship is an intersection across posting lists, with
>> transactions. Straightforward, but not easy to do it fast.
>>
>> The “Inside MarkLogic Server” paper does a good job of explaining the
>> guts.
>>
>> Now, back to our regularly scheduled Solr presentations.
>>
>> wunder
>> Walter Underwood
>> wun...@wunderwood.org
>> http://observer.wunderwood.org/  (my blog)
>>
>>
>> > On Aug 4, 2017, at 8:13 PM, David Hastings <dhasti...@wshein.com>
>> wrote:
>> >
>> > Also, id love to see an example of a many to many relationship in a
>> nosql db as you described, since that's a rdbms concept. If it exists in a
>> nosql environment I would like to learn how...
>> >
>> >> On Aug 4, 2017, at 10:56 PM, Dave <hastings.recurs...@gmail.com>
>> wrote:
>> >>
>> >> Uhm. Dude are you drinking?
>> >>
>> >> 1. Lucidworks would never say that.
>> >> 2. Maria is not a json +MySQL. Maria is a fork of the last open source
>> version of MySQL before oracle bought them
>> >> 3.walter is 100% correct. Solr is search. The only complex data
>> structure it has is an array. Something like mongo can do arrays hashes
>> arrays of hashes etc, it's actually json based. But it can't search well as
>> a search engine can.
>> >>
>> >> There is no one tool. Use each for their own abilities.
>> >>
>> >>
>> >>> On Aug 4, 2017, at 10:35 PM, GW <thegeofo...@gmail.com> wrote:
>> >>>
>> >>> The people @ Lucidworks would beg to disagree but I know exactly what
>> you
>> >>> are saying Walter.
>> >>>
>> >>> A simple flat file like a cardx is fine and dandy as a Solrcloud
>> noSQL DB.
>> >>> I like to express it as knowing when to fish and when to cut bait. As
>> soon
>> >>> as you are in the one - many or many - many world a real DB is a
>> whole lot
>> >>> more sensible.
>> >>>
>> >>> Augment your one-many|many-many NoSQL DB with a Solrcloud and you've
>> got a
>> >>> rocket. Maria (MySQL with JSON) has had text search for a long time
>> but It
>> >>> just does not compare to Solr. Put the two together and you've got
>> some
>> >>> serious magic.
>> >>>
>> >>> No offense intended, There's nothing wrong with being 97.5% correct.
>> I wish
>> >>> I could be 97.5% correct all the time. :-)
>> >>>
>> >>>
>> >>>
>> >>>> On 4 August 2017 at 18:41, Walter Underwood <wun...@wunderwood.org>
>> wrote:
>> >>>>
>> >>>> Solr is NOT a database. If you need a database, don’t choose Solr.
>> >>>>
>> >>>> If you need both a database and search, choose MarkLogic.
>> >>>>
>> >>>> wunder
>> >>>> Walter Underwood
>> >>>> wun...@wunderwood.org
>> >>>> http://observer.wunderwood.org/  (my blog)
>> >>>>
>> >>>>
>> >>>>> On Aug 4, 2017, at 4:16 PM, Francesco Viscomi <fvisc...@gmail.com>
>> >>>> wrote:
>> >>>>>
>> >>>>> Hi all,
>> >>>>> why i have to choose solr if mongoDb is easier to learn and to use?
>> >>>>> Both are NoSql database, is there a good reason to chose solr and
>> not
>> >>>>> mongoDb?
>> >>>>>
>> >>>>> thanks really much
>> >>>>>
>> >>>>> --
>> >>>>> Ing. Viscomi Francesco
>> >>>>
>> >>>>
>>
>>
>


Re: MongoDb vs Solr

2017-08-05 Thread GW
Watch their videos

On 4 August 2017 at 23:26, Walter Underwood <wun...@wunderwood.org> wrote:

> MarkLogic can do many-to-many. I worked there six years ago. They use
> search engine index structure with generational updates, including segment
> level caches. With locking. Pretty good stuff.
>
> A many to many relationship is an intersection across posting lists, with
> transactions. Straightforward, but not easy to do it fast.
>
> The “Inside MarkLogic Server” paper does a good job of explaining the guts.
>
> Now, back to our regularly scheduled Solr presentations.
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
>
> > On Aug 4, 2017, at 8:13 PM, David Hastings <dhasti...@wshein.com> wrote:
> >
> > Also, id love to see an example of a many to many relationship in a
> nosql db as you described, since that's a rdbms concept. If it exists in a
> nosql environment I would like to learn how...
> >
> >> On Aug 4, 2017, at 10:56 PM, Dave <hastings.recurs...@gmail.com> wrote:
> >>
> >> Uhm. Dude are you drinking?
> >>
> >> 1. Lucidworks would never say that.
> >> 2. Maria is not a json +MySQL. Maria is a fork of the last open source
> version of MySQL before oracle bought them
> >> 3.walter is 100% correct. Solr is search. The only complex data
> structure it has is an array. Something like mongo can do arrays hashes
> arrays of hashes etc, it's actually json based. But it can't search well as
> a search engine can.
> >>
> >> There is no one tool. Use each for their own abilities.
> >>
> >>
> >>> On Aug 4, 2017, at 10:35 PM, GW <thegeofo...@gmail.com> wrote:
> >>>
> >>> The people @ Lucidworks would beg to disagree but I know exactly what
> you
> >>> are saying Walter.
> >>>
> >>> A simple flat file like a cardx is fine and dandy as a Solrcloud noSQL
> DB.
> >>> I like to express it as knowing when to fish and when to cut bait. As
> soon
> >>> as you are in the one - many or many - many world a real DB is a whole
> lot
> >>> more sensible.
> >>>
> >>> Augment your one-many|many-many NoSQL DB with a Solrcloud and you've
> got a
> >>> rocket. Maria (MySQL with JSON) has had text search for a long time
> but It
> >>> just does not compare to Solr. Put the two together and you've got some
> >>> serious magic.
> >>>
> >>> No offense intended, There's nothing wrong with being 97.5% correct. I
> wish
> >>> I could be 97.5% correct all the time. :-)
> >>>
> >>>
> >>>
> >>>> On 4 August 2017 at 18:41, Walter Underwood <wun...@wunderwood.org>
> wrote:
> >>>>
> >>>> Solr is NOT a database. If you need a database, don’t choose Solr.
> >>>>
> >>>> If you need both a database and search, choose MarkLogic.
> >>>>
> >>>> wunder
> >>>> Walter Underwood
> >>>> wun...@wunderwood.org
> >>>> http://observer.wunderwood.org/  (my blog)
> >>>>
> >>>>
> >>>>> On Aug 4, 2017, at 4:16 PM, Francesco Viscomi <fvisc...@gmail.com>
> >>>> wrote:
> >>>>>
> >>>>> Hi all,
> >>>>> why i have to choose solr if mongoDb is easier to learn and to use?
> >>>>> Both are NoSql database, is there a good reason to chose solr and not
> >>>>> mongoDb?
> >>>>>
> >>>>> thanks really much
> >>>>>
> >>>>> --
> >>>>> Ing. Viscomi Francesco
> >>>>
> >>>>
>
>


Re: MongoDb vs Solr

2017-08-04 Thread GW
The people @ Lucidworks would beg to disagree but I know exactly what you
are saying Walter.

A simple flat file like a cardx is fine and dandy as a Solrcloud noSQL DB.
I like to express it as knowing when to fish and when to cut bait. As soon
as you are in the one - many or many - many world a real DB is a whole lot
more sensible.

Augment your one-many|many-many NoSQL DB with a Solrcloud and you've got a
rocket. Maria (MySQL with JSON) has had text search for a long time but It
just does not compare to Solr. Put the two together and you've got some
serious magic.

No offense intended, There's nothing wrong with being 97.5% correct. I wish
I could be 97.5% correct all the time. :-)



On 4 August 2017 at 18:41, Walter Underwood  wrote:

> Solr is NOT a database. If you need a database, don’t choose Solr.
>
> If you need both a database and search, choose MarkLogic.
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
>
> > On Aug 4, 2017, at 4:16 PM, Francesco Viscomi 
> wrote:
> >
> > Hi all,
> > why i have to choose solr if mongoDb is easier to learn and to use?
> > Both are NoSql database, is there a good reason to chose solr and not
> > mongoDb?
> >
> > thanks really much
> >
> > --
> > Ing. Viscomi Francesco
>
>


Re: Why do Solr nodes go into Recovery status

2017-06-06 Thread GW
I've heard of systems tanking like this on Windows during OS updates.
Because of this, I run all my updates in attendance even though I'm Linux.
My Nodes run as VM's, I shut down Solr gracefully, snap shot a backup of
the VM, update and run. If things go screwy I can always roll back. To me
it sounds like a lack of resources or a kink in your networking, assuming
your set up is correct. Watch for home made network cables. I've seen soft
crimp connectors put on solid wire which can wreck a switch port forever.
Do you have a separate transaction log device on each Zookeeper? I made
this mistake in the beginning and had similar problems under load.


GW

On 5 June 2017 at 22:32, Erick Erickson <erickerick...@gmail.com> wrote:

> bq: This means that technically the replica nodes should not fall behind
> and do
> not have to go into recovery mode
>
> Well, true if nothing weird happens. By "weird" I mean anything that
> interferes with the leader getting anything other than a success code
> back from a follower it sends  document to.
>
> bq: Is this the only scenario in which a node can go into recovery status?
>
> No, there are others. One for-instance: Leader sends a doc to the
> follower and the request times out (huge  GC pauses, the doc takes too
> long to index for whatever reason etc). The leader then sends a
> message to the follower to go directly into the recovery state since
> the leader has no way of knowing whether the follower successfully
> wrote the document to it's transaction log. You'll see messages about
> "leader initiated recovery" in the follower's solr log in this case.
>
> two bits of pedantry:
>
> bq:  Down by the other replicas
>
> Almost. we're talking indexing here and IIUC only the leader can send
> another node into recovery as all updates go through the leader.
>
> If I'm going to be nit-picky, Zookeeper can _also_ cause a node to be
> marked as down if it's periodic ping of the node fails to return.
> Actually I think this is done through another Solr node that ZK
> notifies
>
> bq: It goes into a recovery mode and tries to recover all the
> documents from the leader of shard1.
>
> Also nit-picky. But if the follower isn't "too far" behind it can be
> brought back into sync from via "peer sync" where it gets the missed
> docs sent to it from the tlog of a healthy replica. "Too far" is 100
> docs by default, but can be set in solrconfig.xml if necessary. If
> that limit is exceeded, then indeed the entire index is copied from
> the leader.
>
> Best,
> Erick
>
>
>
> On Mon, Jun 5, 2017 at 5:18 PM, suresh pendap <sureshfors...@gmail.com>
> wrote:
> > Hi,
> >
> > Why and in what scenarios do Solr nodes go into recovery status?
> >
> > Given that Solr is a CP system it means that the writes for a Document
> > index are acknowledged only after they are propagated and acknowledged by
> > all the replicas of the Shard.
> >
> > This means that technically the replica nodes should not fall behind and
> do
> > not have to go into recovery mode.
> >
> > Is my above understanding correct?
> >
> > Can a below scenario happen?
> >
> > 1. Assume that we have 3 replicas for Shard shard1 with the names
> > shard1_replica1, shard1_replica2 and shard1_replica3.
> >
> > 2. Due to some reason, network issue or something else, the
> shard1_replica2
> > is not reachable by the other replicas and it is marked as Down by the
> > other replicas (shard1_replica1 and shard1_replica3 in this case)
> >
> > 3. The network issue is restored and the shard1_replica2 is reachable
> > again. It goes into a recovery mode and tries to recover all the
> documents
> > from the leader of shard1.
> >
> > Is this the only scenario in which a node can go into recovery status?
> >
> > In other words, does the node has to go into a Down status before getting
> > back into a recovery status?
> >
> >
> > Regards
>


Re: keywords not found - google like feature

2017-04-13 Thread GW
After reading everyone's post, my thoughts are sometimes things are better
achieved with smoke and mirrors.

I achieved something similar by measuring my scores with no keyword hits. I
wrote simple jquery script to do a CSS strike through on the returned
message if the score was poor, + I returned zero results. I run different
CSS for different messages all the time. Kind of working from the vantage
that if your score is crap so are the results. Generally I can get my
searches down to ['response']['numFound']=0 ~ I animate the message
sometimes.



.

On 13 April 2017 at 13:49, Nilesh Kamani  wrote:

> Hello All,
>
> When we search google, sometimes google returns results with mention of
> keywords not found (mentioned as strike-through)
>
> Does Solr provide such feature ?
>
>
> Thanks,
> Nilesh Kamani
>


SQL rpt_location question

2017-03-24 Thread GW
Dear reader,

I've found that using the distinct clause gives me the list I want.

I also have a multivalued rpt_location in the collection that I'd like to
use in the filter.

Is this possible in any way shape of form?

Many thanks in advance,

Greg


Advanced Document Routing Questions

2017-01-29 Thread GW
Hi folks,

1: Can someone point me to some good documentation on how this works? Or is
it so simple that I'm over thinking?

My understanding of document routing is that I might be able to check the
hash of a shard with the hash of the document id and determine the exact
node / document and know exactly where to send the request reducing
Zookeeper traffic.

I'm getting ready to deploy and I have used the recommended format in my
doc id

All my work is REST/curl -> Solrcloud

I plan to watch cluster status through the admin console REST to and build
a list of OK servers to do the reads for the website.

I have a crawler that will be running mostly 3:am Eastern to 3:am Pacific,
outside the bulk of read activity. I plan to do all posts to Who Has
Zookeeper according the admin REST API

Can I get some reassurance? Be gentle, this is my very first solrcloud
deployment and it's going to production. I'm about to write script for
something that I still feel I am week in concept.

When I'm done and I totally understand, I promise to publish a nice A - Z
REST deployment HowTo for HA with class examples in (PHP,Perl,Python)/curl.


Best regards,

GW


Do I need a replication factor if I run an Hadoop file system?

2016-12-20 Thread GW
Dear Users,

I'm in the process of making an app cluster aware and getting ready for
deployment.

I've been looking at running an Hadoop file system. If I have fault
tolerance at the file system it seems that I would be creating a ton of
extra drive i/o to do replication. Am I correct in this assumption?

My data is not really critical so one of the fist things I will be doing is
seeing if I can do a full reindex in a timely manner. My data needs to
change weekly. Can I have a case where replication is not required? It
seems that I might.

So, my thoughts bounce back to Solr on plain old SSD on a journaled file
system.

Thanks,

GW


Re: Has anyone used linode.com to run Solr | ??Best way to deliver PHP/Apache clients with Solr question

2016-12-18 Thread GW
Wow, thanks.

So assuming I have a five node ensemble and one machine is rolling along as
leader, am I correct to assume that as a leader becomes taxed it can lose
the election and another takes over as leader? The leader actually floats
about the ensemble under load? I was thinking the leader was merely for
referential integrity and things stayed that way until a physical failure.

This would all seem important when building indexes.

I think I need to set up a sniffer.

Identifying the node with a hash id seems very cool. If my app makes the
call to the server with the appropriate shard, then there might only be
messaging on the Zookeeper network. Is this a correct assumption?

Is my terminology cross threaded?

Oh well, time to build my first cluster. I wrote all my clients with single
shard collections on a stand alone. Now I need to make sure my app is not a
cluster buster.

I feel like I am on the right path.

Thanks and Best,

GW

















On 18 December 2016 at 09:53, Dorian Hoxha <dorian.ho...@gmail.com> wrote:

> On Sun, Dec 18, 2016 at 3:48 PM, GW <thegeofo...@gmail.com> wrote:
>
> > Yeah,
> >
> >
> > I'll look at the proxy you suggested shortly.
> >
> > I've discovered that the idea of making a zookeeper aware app is
> pointless
> > when scripting REST calls right after I installed libzookeeper.
> >
> > Zookeeper is there to provide the zookeeping for Solr: End of story. Me
> > thinks
> >
> > I believe what really has to happen is: connect to the admin API to get
> > status
> >
> > /solr/admin/collections?action=CLUSTERSTATUS
> >
> > I think it is more sensible to make a cluster aware app.
> >
> > 1 > name="shards"> > name="range">8000-7fffactive > name="replicas"> > name="core">FrogMerchants_shard1_replica1
> > http://10.128.0.2:8983/solr > name="node_name">10.128.0.2:8983_solr > name="state">active > name="leader">true
> >
> > I can get an array of nodes that have a state of active. So if I have 7
> > nodes that are state = active, I will have those in an array. Then I can
> > use rand() funtion with an array count to select a node/url to post a
> json
> > string. It would eliminate the need for a load balancer. I think.
> >
> If you send to random(node), there is high chance(increasing with number of
> nodes/shards) that node won't have the leader, so that node will also
> redirect it to the leader. What you can do, is compute the hash of the 'id'
> field locally. with hash-id you will get shard-id (because each shard has
> the hash-range), and with shard, you will find the leader, and you will
> find on which node the leader is (cluster-status) and send the request
> directly to the leader and be certain that it won't be redirected again
> (less network hops).
>
>
> > //pseudo code
> >
> > $array_count = $count($active_nodes)
> >
> > $url_target = rand(0, $array_count);
> >
> > // creat a function to pull the url   somthing like
> >
> >
> > $url = get_solr_url($url_target);
> >
> > I have test sever on my bench. I'll spin up a 5 node cluster today, get
> my
> > app cluster aware and then get into some Solr indexes with Vi and totally
> > screw with some shards.
> >
> > If I am correct I will post again.
> >
> > Best,
> >
> > GW
> >
> > On 15 December 2016 at 12:34, Shawn Heisey <apa...@elyograg.org> wrote:
> >
> > > On 12/14/2016 7:36 AM, GW wrote:
> > > > I understand accessing solr directly. I'm doing REST calls to a
> single
> > > > machine.
> > > >
> > > > If I have a cluster of five servers and say three Apache servers, I
> can
> > > > round robin the REST calls to all five in the cluster?
> > > >
> > > > I guess I'm going to find out. :-)  If so I might be better off just
> > > > running Apache on all my solr instances.
> > >
> > > If you're running SolrCloud (which uses zookeeper) then sending
> multiple
> > > query requests to any node will load balance the requests across all
> > > replicas for the collection.  This is an inherent feature of SolrCloud.
> > > Indexing requests will be forwarded to the correct place.
> > >
> > > The node you're sending to is a potential single point of failure,
> which
> > > you can eliminate by putting a load balancer in front of Solr that
> > > connects to at least two of the nodes.  As I just mentioned, SolrCloud
> > > will do further load balancing to al

Re: Has anyone used linode.com to run Solr | ??Best way to deliver PHP/Apache clients with Solr question

2016-12-18 Thread GW
Yeah,


I'll look at the proxy you suggested shortly.

I've discovered that the idea of making a zookeeper aware app is pointless
when scripting REST calls right after I installed libzookeeper.

Zookeeper is there to provide the zookeeping for Solr: End of story. Me
thinks

I believe what really has to happen is: connect to the admin API to get
status

/solr/admin/collections?action=CLUSTERSTATUS

I think it is more sensible to make a cluster aware app.

18000-7fffactiveFrogMerchants_shard1_replica1
http://10.128.0.2:8983/solr10.128.0.2:8983_solractivetrue

I can get an array of nodes that have a state of active. So if I have 7
nodes that are state = active, I will have those in an array. Then I can
use rand() funtion with an array count to select a node/url to post a json
string. It would eliminate the need for a load balancer. I think.

//pseudo code

$array_count = $count($active_nodes)

$url_target = rand(0, $array_count);

// creat a function to pull the url   somthing like


$url = get_solr_url($url_target);

I have test sever on my bench. I'll spin up a 5 node cluster today, get my
app cluster aware and then get into some Solr indexes with Vi and totally
screw with some shards.

If I am correct I will post again.

Best,

GW

On 15 December 2016 at 12:34, Shawn Heisey <apa...@elyograg.org> wrote:

> On 12/14/2016 7:36 AM, GW wrote:
> > I understand accessing solr directly. I'm doing REST calls to a single
> > machine.
> >
> > If I have a cluster of five servers and say three Apache servers, I can
> > round robin the REST calls to all five in the cluster?
> >
> > I guess I'm going to find out. :-)  If so I might be better off just
> > running Apache on all my solr instances.
>
> If you're running SolrCloud (which uses zookeeper) then sending multiple
> query requests to any node will load balance the requests across all
> replicas for the collection.  This is an inherent feature of SolrCloud.
> Indexing requests will be forwarded to the correct place.
>
> The node you're sending to is a potential single point of failure, which
> you can eliminate by putting a load balancer in front of Solr that
> connects to at least two of the nodes.  As I just mentioned, SolrCloud
> will do further load balancing to all nodes which are capable of serving
> the requests.
>
> I use haproxy for a load balancer in front of Solr.  I'm not running in
> Cloud mode, but a load balancer would also work for Cloud, and is
> required for high availability when your client only connects to one
> server and isn't cloud aware.
>
> http://www.haproxy.org/
>
> Solr includes a cloud-aware Java client that talks to zookeeper and
> always knows the state of the cloud.  This eliminates the requirement
> for a load balancer, but using that client would require that you write
> your website in Java.
>
> The PHP clients are third-party software, and as far as I know, are not
> cloud-aware.
>
> https://wiki.apache.org/solr/IntegratingSolr#PHP
>
> Some advantages of using a Solr client over creating HTTP requests
> yourself:  The code is easier to write, and to read.  You generally do
> not need to worry about making sure that your requests are properly
> escaped for URLs, XML, JSON, etc.  The response to the requests is
> usually translated into data structures appropriate to the language --
> your program probably doesn't need to know how to parse XML or JSON.
>
> Thanks,
> Shawn
>
>


Re: Max vertical scaling in your experience ? (1 instance/server)

2016-12-16 Thread GW
Layer 2 bridge SAN is just for my Apache/apps on Conga so they can be spun
on up any host with a static IP. This has nothing to do with Solr which is
running on plain old hardware.

Solrcloud is on a real cluster not on a SAN.

The bit about dead with no error. I got this from a post I made asking
about the best way to deploy apps. Was shown some code on making your app
zookeeper aware. I am just getting to this so I'm talking from my ass. A ZK
aware program will have a list of nodes ready for business verses a plain
old Round Robin. If data on a machine is corrupted you can get 0 docs found
while a ZK aware app will know that node is shite.







On 16 December 2016 at 07:20, Dorian Hoxha <dorian.ho...@gmail.com> wrote:

> On Fri, Dec 16, 2016 at 12:39 PM, GW <thegeofo...@gmail.com> wrote:
>
> > Dorian,
> >
> > From my reading, my belief is that you just need some beefy machines for
> > your zookeeper ensemble so they can think fast.
>
> Zookeeper need to think fast enough for cluster state/changes. So I think
> it scales with the number of machines/collections/shards and not documents.
>
> > After that your issues are
> > complicated by drive I/O which I believe is solved by using shards. If
> you
> > have a collection running on top of a single drive array it should not
> > compare to writing to a dozen drive arrays. So a whole bunch of light
> duty
> > machines that have a decent amount of memory and barely able process
> faster
> > than their drive I/O will serve you better.
> >
> My dataset will be lower than total memory, so I expect no query to hit
> disk.
>
> >
> > I think the Apache big data mandate was to be horizontally scalable to
> > infinity with cheap consumer hardware. In my minds eye you are not going
> to
> > get crazy input rates without a big horizontal drive system.
> >
> There is overhead with small machines, and with very big machines (pricy).
> So something in the middle.
> So small cluster of big machines or big cluster of small machines.
>
> >
> > I'm in the same boat. All the scaling and roll out documentation seems to
> > reference the Witch Doctor's secret handbook.
> >
> > I just started into making my applications ZK aware and really just
> > starting to understand the architecture. After a whole year I still feel
> > weak while at the same time I have traveled far. I still feel like an
> > amateur.
> >
> > My plans are to use bridge tools in Linux so all my machines are sitting
> on
> > the switch with layer 2. Then use Conga to monitor which apps need to be
> > running. If a server dies, it's apps are spun up on one of the other
> > servers using the original IP and mac address through a bridge firewall
> > gateway so there is no hold up with with mac phreaking like layer 3.
> Layer
> > 3 does not like to see a route change with a mac address. My apps will be
> > on a SAN ~ Data on as many shards/machines as financially possible.
> >
> By conga you mean https://sourceware.org/cluster/conga/spec/ ?
> Also SAN may/will suck like someone answered in your thread.
>
> >
> > I was going to put a bunch of Apache web servers in round robin to talk
> to
> > Solr but discovered that a Solr node can be dead and not report errors.
> >
> Please explain more "dead but no error".
>
> > It's all rough at the moment but it makes total sense to send Solr
> requests
> > based on what ZK says is available verses a round robin.
> >
> Yes, like I commenter wrote on your thread.
>
> >
> > Will keep you posted on my roll out if you like.
> >
> > Best,
> >
> > GW
> >
> >
> >
> >
> >
> >
> >
> > On 16 December 2016 at 03:31, Dorian Hoxha <dorian.ho...@gmail.com>
> wrote:
> >
> > > Hello searchers,
> > >
> > > I'm researching solr for a project that would require a
> > max-inserts(10M/s)
> > > and some heavy facet+fq on top of that, though on low qps.
> > >
> > > And I'm trying to find blogs/slides where people have used some big
> > > machines instead of hundreds of small ones.
> > >
> > > 1. Largest I've found is this
> > > <https://sbdevel.wordpress.com/2016/11/30/70tb-16b-docs-
> > > 4-machines-1-solrcloud/>
> > > with 16cores + 384GB ram but they were using 25! solr4 instances /
> server
> > > which seems wasteful to me ?
> > >
> > > I know that 1 solr can have max ~29-30GB heap because GC is
> > wasteful/sucks
> > > after that, and you should leave the other amount to the os for
> > file-cache.
> > > 2. But do you think 1 instance will be able to fully-use a 256GB/20core
> > > machine ?
> > >
> > > 3. Like to share your findings/links with big-machine clusters ?
> > >
> > > Thank You
> > >
> >
>


Re: Max vertical scaling in your experience ? (1 instance/server)

2016-12-16 Thread GW
Dorian,

>From my reading, my belief is that you just need some beefy machines for
your zookeeper ensemble so they can think fast. After that your issues are
complicated by drive I/O which I believe is solved by using shards. If you
have a collection running on top of a single drive array it should not
compare to writing to a dozen drive arrays. So a whole bunch of light duty
machines that have a decent amount of memory and barely able process faster
than their drive I/O will serve you better.

I think the Apache big data mandate was to be horizontally scalable to
infinity with cheap consumer hardware. In my minds eye you are not going to
get crazy input rates without a big horizontal drive system.

I'm in the same boat. All the scaling and roll out documentation seems to
reference the Witch Doctor's secret handbook.

I just started into making my applications ZK aware and really just
starting to understand the architecture. After a whole year I still feel
weak while at the same time I have traveled far. I still feel like an
amateur.

My plans are to use bridge tools in Linux so all my machines are sitting on
the switch with layer 2. Then use Conga to monitor which apps need to be
running. If a server dies, it's apps are spun up on one of the other
servers using the original IP and mac address through a bridge firewall
gateway so there is no hold up with with mac phreaking like layer 3. Layer
3 does not like to see a route change with a mac address. My apps will be
on a SAN ~ Data on as many shards/machines as financially possible.

I was going to put a bunch of Apache web servers in round robin to talk to
Solr but discovered that a Solr node can be dead and not report errors.
It's all rough at the moment but it makes total sense to send Solr requests
based on what ZK says is available verses a round robin.

Will keep you posted on my roll out if you like.

Best,

GW







On 16 December 2016 at 03:31, Dorian Hoxha <dorian.ho...@gmail.com> wrote:

> Hello searchers,
>
> I'm researching solr for a project that would require a max-inserts(10M/s)
> and some heavy facet+fq on top of that, though on low qps.
>
> And I'm trying to find blogs/slides where people have used some big
> machines instead of hundreds of small ones.
>
> 1. Largest I've found is this
> <https://sbdevel.wordpress.com/2016/11/30/70tb-16b-docs-
> 4-machines-1-solrcloud/>
> with 16cores + 384GB ram but they were using 25! solr4 instances / server
> which seems wasteful to me ?
>
> I know that 1 solr can have max ~29-30GB heap because GC is wasteful/sucks
> after that, and you should leave the other amount to the os for file-cache.
> 2. But do you think 1 instance will be able to fully-use a 256GB/20core
> machine ?
>
> 3. Like to share your findings/links with big-machine clusters ?
>
> Thank You
>


Re: Has anyone used linode.com to run Solr | ??Best way to deliver PHP/Apache clients with Solr question

2016-12-15 Thread GW
Thanks Tom,

It looks like there is an PHP extension on Git. seems like a phpized C lib
to create a Zend module to work with ZK. No mention of solr but I'm
guessing I can poll the ensemble for pretty much anything ZK.

Thanks for the direction! A ZK aware app is the way I need to go. I'll give
it go in the next few days.

Best,

GW





On 15 December 2016 at 09:52, Tom Evans <tevans...@googlemail.com> wrote:

> On Thu, Dec 15, 2016 at 12:37 PM, GW <thegeofo...@gmail.com> wrote:
> > While my client is all PHP it does not use a solr client. I wanted to
> stay
> > with he latest Solt Cloud and the PHP clients all seemed to have some
> kind
> > of issue being unaware of newer Solr Cloud versions. The client makes
> pure
> > REST calls with Curl. It is stateful through local storage. There is no
> > persistent connection. There are no cookies and PHP work is not sticky so
> > it is designed for round robin on both the internal network.
> >
> > I'm thinking we have a different idea of persistent. To me something like
> > MySQL can be persistent, ie a fifo queue for requests. The stack can be
> > always on/connected on something like a heap storage.
> >
> > I never thought about the impact of a solr node crashing with PHP on top.
> > Many thanks!
> >
> > Was thinking of running a conga line (Ricci & Luci projects) and shutting
> > down and replacing failed nodes. Never done this with Solr. I don't see
> any
> > reasons why it would not work.
> >
> > ** When you say an array of connections per host. It would still require
> an
> > internal DNS because hosts files don't round robin. perhaps this is
> handled
> > in the Python client??
>
>
> The best Solr clients will take the URIs of the Zookeeper servers;
> they do not make queries via Zookeeper, but will read the current
> cluster status from zookeeper in order to determine which solr node to
> actually connect to, taking in to account what nodes are alive, and
> the state of particular shards.
>
> SolrJ (Java) will do this, as will pysolr (python), I'm not aware of a
> PHP client that is ZK aware.
>
> If you don't have a ZK aware client, there are several options:
>
> 1) Make your favourite client ZK aware, like in [1]
> 2) Use round robin DNS to distribute requests amongst the cluster.
> 3) Use a hardware or software load balancer in front of the cluster.
> 4) Use shared state to store the names of active nodes*
>
> All apart from 1) have significant downsides:
>
> 2) Has no concept of a node being down. Down nodes should not cause
> query failures, the requests should go elsewhere in the cluster.
> Requires updating DNS to add or remove nodes.
> 3) Can detect "down" nodes. Has no idea about the state of the
> cluster/shards (usually).
> 4) Basically duplicates what ZooKeeper does, but less effectively -
> doesn't know cluster state, down nodes, nodes that are up but with
> unhealthy replicas...
>
> >
> > You have given me some good clarification. I think lol. I know I can spin
> > out WWW servers based on load. I'm not sure how shit will fly spinning up
> > additional solr nodes. I'm not sure what happens if you spin up an empty
> > solr node and what will happen with replication, shards and load cost of
> > spinning an instance. I'm facing some experimentation me thinks. This
> will
> > be a manual process at first, for sure
> >
> > I guess I could put the solr connect requests in my clients into a try
> > loop, looking for successful connections by name before any action.
>
> In SolrCloud mode, you can spin up/shut down nodes as you like.
> Depending on how you have configured your collections, new replicas
> may be automatically created on the new node, or the node will simply
> become part of the cluster but empty, ready for you to assign new
> replicas to it using the Collections API.
>
> You can also use what are called "snitches" to define rules for how
> you want replicas/shards allocated amongst the nodes, eg to avoid
> placing all the replicas for a shard in the same rack.
>
> Cheers
>
> Tom
>
> [1] https://github.com/django-haystack/pysolr/commit/
> 366f14d75d2de33884334ff7d00f6b19e04e8bbf
>


Re: Has anyone used linode.com to run Solr | ??Best way to deliver PHP/Apache clients with Solr question

2016-12-15 Thread GW
While my client is all PHP it does not use a solr client. I wanted to stay
with he latest Solt Cloud and the PHP clients all seemed to have some kind
of issue being unaware of newer Solr Cloud versions. The client makes pure
REST calls with Curl. It is stateful through local storage. There is no
persistent connection. There are no cookies and PHP work is not sticky so
it is designed for round robin on both the internal network.

I'm thinking we have a different idea of persistent. To me something like
MySQL can be persistent, ie a fifo queue for requests. The stack can be
always on/connected on something like a heap storage.

I never thought about the impact of a solr node crashing with PHP on top.
Many thanks!

Was thinking of running a conga line (Ricci & Luci projects) and shutting
down and replacing failed nodes. Never done this with Solr. I don't see any
reasons why it would not work.

** When you say an array of connections per host. It would still require an
internal DNS because hosts files don't round robin. perhaps this is handled
in the Python client??

You have given me some good clarification. I think lol. I know I can spin
out WWW servers based on load. I'm not sure how shit will fly spinning up
additional solr nodes. I'm not sure what happens if you spin up an empty
solr node and what will happen with replication, shards and load cost of
spinning an instance. I'm facing some experimentation me thinks. This will
be a manual process at first, for sure

I guess I could put the solr connect requests in my clients into a try
loop, looking for successful connections by name before any action.

Many thanks,

GW




On 15 December 2016 at 04:46, Dorian Hoxha <dorian.ho...@gmail.com> wrote:

> See replies inline:
>
> On Wed, Dec 14, 2016 at 3:36 PM, GW <thegeofo...@gmail.com> wrote:
>
> > Thanks,
> >
> > I understand accessing solr directly. I'm doing REST calls to a single
> > machine.
> >
> > If I have a cluster of five servers and say three Apache servers, I can
> > round robin the REST calls to all five in the cluster?
> >
> I don't know about php, but it would be better to have "persistent
> connections" or something to the solr servers. In python for example this
> is done automatically. It would be better if each php-server has a
> different order of an array of [list of solr ips]. This way each box will
> contact a ~different solr instance, and will have better chance of not
> creating too may new connections (since the connection cache is
> per-url/ip).
>
> >
> > I guess I'm going to find out. :-)  If so I might be better off just
> > running Apache on all my solr instances.
> >
> I've done that before (though with es, but it's ~same). And just contacting
> the localhost solr. The problem with that, is that if the solr on the
> current host fails, your php won't work. So best in this scenario is to
> have an array of hosts, but the first being the local solr.
>
> >
> >
> >
> >
> >
> > On 14 December 2016 at 07:08, Dorian Hoxha <dorian.ho...@gmail.com>
> wrote:
> >
> > > See replies inline:
> > >
> > > On Wed, Dec 14, 2016 at 11:16 AM, GW <thegeofo...@gmail.com> wrote:
> > >
> > > > Hello folks,
> > > >
> > > > I'm about to set up a Web service I created with PHP/Apache <--> Solr
> > > Cloud
> > > >
> > > > I'm hoping to index a bazillion documents.
> > > >
> > > ok , how many inserts/second ?
> > >
> > > >
> > > > I'm thinking about using Linode.com because the pricing looks great.
> > Any
> > > > opinions??
> > > >
> > > Pricing is 'ok'. For bazillion documents, I would skip vps and go
> > straight
> > > dedicated. Check out ovh.com / online.net etc etc
> > >
> > > >
> > > > I envision using an Apache/PHP round robin in front of a solr cloud
> > > >
> > > > My thoughts are that I send my requests to the Solr instances on the
> > > > Zookeeper Ensemble. Am I missing something?
> > > >
> > > You contact with solr directly, don't have to connect to zookeeper for
> > > loadbalancing.
> > >
> > > >
> > > > What can I say.. I'm software oriented and a little hardware
> > challenged.
> > > >
> > > > Thanks in advance,
> > > >
> > > > GW
> > > >
> > >
> >
>


Re: Has anyone used linode.com to run Solr | ??Best way to deliver PHP/Apache clients with Solr question

2016-12-14 Thread GW
Thanks,

I understand accessing solr directly. I'm doing REST calls to a single
machine.

If I have a cluster of five servers and say three Apache servers, I can
round robin the REST calls to all five in the cluster?

I guess I'm going to find out. :-)  If so I might be better off just
running Apache on all my solr instances.





On 14 December 2016 at 07:08, Dorian Hoxha <dorian.ho...@gmail.com> wrote:

> See replies inline:
>
> On Wed, Dec 14, 2016 at 11:16 AM, GW <thegeofo...@gmail.com> wrote:
>
> > Hello folks,
> >
> > I'm about to set up a Web service I created with PHP/Apache <--> Solr
> Cloud
> >
> > I'm hoping to index a bazillion documents.
> >
> ok , how many inserts/second ?
>
> >
> > I'm thinking about using Linode.com because the pricing looks great. Any
> > opinions??
> >
> Pricing is 'ok'. For bazillion documents, I would skip vps and go straight
> dedicated. Check out ovh.com / online.net etc etc
>
> >
> > I envision using an Apache/PHP round robin in front of a solr cloud
> >
> > My thoughts are that I send my requests to the Solr instances on the
> > Zookeeper Ensemble. Am I missing something?
> >
> You contact with solr directly, don't have to connect to zookeeper for
> loadbalancing.
>
> >
> > What can I say.. I'm software oriented and a little hardware challenged.
> >
> > Thanks in advance,
> >
> > GW
> >
>


Has anyone used linode.com to run Solr | ??Best way to deliver PHP/Apache clients with Solr question

2016-12-14 Thread GW
Hello folks,

I'm about to set up a Web service I created with PHP/Apache <--> Solr Cloud

I'm hoping to index a bazillion documents.

I'm thinking about using Linode.com because the pricing looks great. Any
opinions??

I envision using an Apache/PHP round robin in front of a solr cloud

My thoughts are that I send my requests to the Solr instances on the
Zookeeper Ensemble. Am I missing something?

What can I say.. I'm software oriented and a little hardware challenged.

Thanks in advance,

GW


Re: Need help to update multiple documents

2016-11-24 Thread GW
I've not looked at your file. If you are really thinking update, there is
no such thing. You can only replace the entire document or delete it.

On 23 November 2016 at 23:47, Reddy Sankar 
wrote:

> Hi Team ,
>
>
>
> Facing issue to update multiple document in SOLAR at time in my batch job.
>
>
>
> Could you please help me by giving example or an documentation for the
> same.
>
>
>
> Thanks
>
> Sankar Reddy M.B
>


Re: unable to write docs

2016-11-21 Thread GW
Check out Prateeks answer first and use commit wisely.

99% chance it's a commit issue.

On 21 November 2016 at 08:42, Alexandre Rafalovitch 
wrote:

> What's the specific error message for 2). And did it only happen once
> or once in a while?
> 
> http://www.solr-start.com/ - Resources for Solr users, new and experienced
>
>
> On 22 November 2016 at 00:39, Prateek Jain J
>  wrote:
> >
> > 1. Commits are issued every 30 seconds, not on every write operation.
> > 2. logs only has error entries saying it failed to write documents.
> >
> >
> > Regards,
> > Prateek Jain
> >
> > -Original Message-
> > From: Alexandre Rafalovitch [mailto:arafa...@gmail.com]
> > Sent: 21 November 2016 12:34 PM
> > To: solr-user 
> > Subject: Re: unable to write docs
> >
> > 1) Are you definitely issuing commits?
> > 2) Do you have anything in the logs? You should if that's an exceptional
> situation.
> >
> > Regards,
> >Alex.
> > 
> > http://www.solr-start.com/ - Resources for Solr users, new and
> experienced
> >
> >
> > On 21 November 2016 at 23:18, Prateek Jain J <
> prateek.j.j...@ericsson.com> wrote:
> >>
> >> Hi All,
> >>
> >> We are observing that SOLR is able to query documents but is failing to
> write documents (create indexes). This is happening only for one core,
> other cores are working fine. Can you think of possible reasons which can
> lead to this? Disk has enough space to write/index and has correct
> permissions. We are using solr 4.8.1 and not using solr-cloud.
> >>
> >> Although, the error is gone after restarting solr but just curious to
> know what can lead to such situation.
> >>
> >>
> >> Regards,
> >> Prateek Jain
> >>
>


Re: How-To: Secure Solr by IP Address

2016-11-04 Thread GW
I run a small solrcloud on a set of internal IP address. I connect with a
routed OpenVPN so I hit solr on 10.8.0.1:8983 from my desktop. Only my web
clients are on public IPs and only those clients can talk to the inside
cluster.

That's how I manage things...

On 4 November 2016 at 09:27, David Smiley  wrote:

> I was just researching how to secure Solr by IP address and I finally
> figured it out.  Perhaps this might go in the ref guide but I'd like to
> share it here anyhow.  The scenario is where only "localhost" should have
> full unfettered access to Solr, whereas everyone else (notably web clients)
> can only access some whitelisted paths.  This setup is intended for a
> single instance of Solr (not a member of a cluster); the particular config
> below would probably need adaptations for a cluster of Solr instances.  The
> technique here uses a utility with Jetty called IPAccessHandler --
> http://download.eclipse.org/jetty/stable-9/apidocs/org/
> eclipse/jetty/server/handler/IPAccessHandler.html
> For reasons I don't know (and I did search), it was recently deprecated and
> there's another InetAccessHandler (not in Solr's current version of Jetty)
> but it doesn't support constraints incorporating paths, so it's a
> non-option for my needs.
>
> First, Java must be told to insist on it's IPv4 stack. This is because
> Jetty's IPAccessHandler simply doesn't support IPv6 IP matching; it throws
> NPEs in my experience. In recent versions of Solr, this can be easily done
> just by adding -Djava.net.preferIPv4Stack=true at the Solr start
> invocation.  Alternatively put it into SOLR_OPTS perhaps in solr.in.sh.
>
> Edit server/etc/jetty.xml, and replace the line
> mentioning ContextHandlerCollection with this:
>
>  class="org.eclipse.jetty.server.handler.IPAccessHandler">
>
>  
>127.0.0.1
>-.-.-.-|/solr/techproducts/select
>  
>
>false
>
>   class="org.eclipse.jetty.server.handler.ContextHandlerCollection"/>
>
>  
>
> This mechanism wraps ContextHandlerCollection (which ultimately serves
> Solr) with this handler that adds the constraints.  These constraints above
> allow localhost to do anything; other IP addresses can only access
> /solr/techproducts/select.  That line could be duplicated for other
> white-listed paths -- I recommend creating request handlers for your use,
> possibly with invariants to further constraint what someone can do.
>
> note: I originally tried inserting the IPAccessHandler in
> server/contexts/solr-jetty-context.xml but found that there's a bug in
> IPAccessHanlder that fails to consider when HttpServletRequest.getPathInfo
> is null.  And it wound up letting everything through (if I recall).  But I
> like it up in server.xml anyway as it intercepts everything
>
> ~ David
>
> --
> Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
> LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
> http://www.solrenterprisesearchserver.com
>


Re: Heatmap in JSON facet API

2016-10-30 Thread GW
If we are talking about the same kind of heat maps you might want to look
at the TomTom map API for a quick and dirty yet solid solution. Just supply
a whack of coordinates and let TomTom do the work. The Heat maps will zoom
in and de-cluster.

Example below.

http://www.frogclassifieds.com/tomtom/markers-clustering.html


On 28 October 2016 at 09:05, Никита Веневитин 
wrote:

> Hi!
> Is it possible to use JSON facet API to get heatmaps?
>


Re: SolrJ for .NET / C#

2016-08-16 Thread GW
Interesting, I managed to do Solr SQL

On 16 August 2016 at 12:22, Joe Lawson <jlaw...@opensourceconnections.com>
wrote:

> The sad part of doing plain old REST requests is you basically miss out on
> all the SolrCloud features that are inherent in client call optimization
> and collection discovery. It would be nice if some companies made /contrib
> offerings for different languages that could be better maintained.
>
> Most REST clients are stuck in a pre-SolrCloud world or master/slave
> configuration and that paradigm is going away.
>
> On Tue, Aug 16, 2016 at 10:43 AM, GW <thegeofo...@gmail.com> wrote:
>
> > The client that comes with PHP is lame. If installed you should
> un-install
> > php5-solr and install the Pecl/Pear libs which are good to the end of 5.x
> > and 6.01. It tanks with 6.1.
> >
> > I defer to my own effort of changing everything to plain old REST
> requests.
> >
> > On 16 August 2016 at 10:39, GW <thegeofo...@gmail.com> wrote:
> >
> > > As long as you are .NET you will be last in line. You try using the
> REST
> > > API. All you get with a .NET/C# lib is a wrapper for the REST API.
> > >
> > >
> > >
> > > On 16 August 2016 at 09:08, Joe Lawson <jlawson@
> > opensourceconnections.com>
> > > wrote:
> > >
> > >> All I have seen is SolrNET, forks of SolrNET and people using
> RestSharp.
> > >>
> > >> On Tue, Aug 16, 2016 at 9:01 AM, Eirik Hungnes <hung...@rubrikk.no>
> > >> wrote:
> > >>
> > >> > Hi
> > >> >
> > >> > I have been looking around for a library for .NET / C#. We are
> > currently
> > >> > using SolrNet, but that is ofc not as well equipped as SolrJ, and
> have
> > >> > heard rumors occasionally about someone, also Lucene, has been
> working
> > >> on a
> > >> > port to other languages?
> > >> >
> > >> > --
> > >> > Best regards,
> > >> >
> > >> > Eirik
> > >> >
> > >>
> > >>
> > >>
> > >> --
> > >> -Joe
> > >>
> > >
> > >
> >
>
>
>
> --
> -Joe
>


Re: SolrJ for .NET / C#

2016-08-16 Thread GW
The client that comes with PHP is lame. If installed you should un-install
php5-solr and install the Pecl/Pear libs which are good to the end of 5.x
and 6.01. It tanks with 6.1.

I defer to my own effort of changing everything to plain old REST requests.

On 16 August 2016 at 10:39, GW <thegeofo...@gmail.com> wrote:

> As long as you are .NET you will be last in line. You try using the REST
> API. All you get with a .NET/C# lib is a wrapper for the REST API.
>
>
>
> On 16 August 2016 at 09:08, Joe Lawson <jlaw...@opensourceconnections.com>
> wrote:
>
>> All I have seen is SolrNET, forks of SolrNET and people using RestSharp.
>>
>> On Tue, Aug 16, 2016 at 9:01 AM, Eirik Hungnes <hung...@rubrikk.no>
>> wrote:
>>
>> > Hi
>> >
>> > I have been looking around for a library for .NET / C#. We are currently
>> > using SolrNet, but that is ofc not as well equipped as SolrJ, and have
>> > heard rumors occasionally about someone, also Lucene, has been working
>> on a
>> > port to other languages?
>> >
>> > --
>> > Best regards,
>> >
>> > Eirik
>> >
>>
>>
>>
>> --
>> -Joe
>>
>
>


Re: SolrJ for .NET / C#

2016-08-16 Thread GW
As long as you are .NET you will be last in line. You try using the REST
API. All you get with a .NET/C# lib is a wrapper for the REST API.



On 16 August 2016 at 09:08, Joe Lawson 
wrote:

> All I have seen is SolrNET, forks of SolrNET and people using RestSharp.
>
> On Tue, Aug 16, 2016 at 9:01 AM, Eirik Hungnes  wrote:
>
> > Hi
> >
> > I have been looking around for a library for .NET / C#. We are currently
> > using SolrNet, but that is ofc not as well equipped as SolrJ, and have
> > heard rumors occasionally about someone, also Lucene, has been working
> on a
> > port to other languages?
> >
> > --
> > Best regards,
> >
> > Eirik
> >
>
>
>
> --
> -Joe
>


Re: solr date range query

2016-08-16 Thread GW
This query would indicate two multivalued fields

This query will return results if you put in a value for the field
eventEnddate of 10 years ago as long as the field eventStartdate is
satisfied.





On 16 August 2016 at 08:16, solr2020  wrote:

> eventStartdate:[2016-08-02T00:00:00Z TO 2016-08-05T23:59:59.999Z] OR
> eventEnddate:[2016-08-02T00:00:00Z TO 2016-08-05T23:59:59.999Z]
>
> this is my query.
>
>
>
> --
> View this message in context: http://lucene.472066.n3.
> nabble.com/solr-date-range-query-tp4291918p4291922.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: solr date range query

2016-08-16 Thread GW
can you send the query you are using?

On 16 August 2016 at 08:03, solr2020  wrote:

> yes. dates are stored as a single valued date field
>
>
>
> --
> View this message in context: http://lucene.472066.n3.
> nabble.com/solr-date-range-query-tp4291918p4291920.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: solr date range query

2016-08-16 Thread GW
Am I to assume these dates are stored in a single multivalued field?

On 16 August 2016 at 07:51, solr2020  wrote:

> Hi,
>
> We have list of events with events start date and end date.for eg:
> event1 starts @ 2nd Aug 2016 ends @ 3rd Aug 2016
> event2 starts @ 4th Aug 2016 ends @ 5th Aug 2016
> event3 starts @ 1st Aug 2016 ends @ 7th Aug 2016
> event4 starts @ 15th july 2016 ends @ 15th Aug 2016
>
> when user selects a date range Aug 2nd to Aug 5th 2016 we are able to fetch
> event1 and event2 with start and end date range query (Aug 2nd  TO Aug 5th
> ). But as event3 and event4 are also an ongoing event we need to fetch that
> . how this can be achieved?
>
> Thanks.
>
>
>
> --
> View this message in context: http://lucene.472066.n3.
> nabble.com/solr-date-range-query-tp4291918.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Inconsistent results with solr admin ui and solrj

2016-08-13 Thread GW
No offense intended, but you are looking at a problem with your work. You
need to explain what you are doing not what is happening.

If you are trying to use PHP and the latest PECL/PEAR, it does not work so
well. It is considerably older than Solr 6.1.
This was the only issue I ran into with 6.1.






On 13 August 2016 at 06:10, Pranaya Behera  wrote:

> Hi,
> I am running solr 6.1.0 with solrcloud. We have 3 instance of
> zookeeper and 3 instance of solrcloud. All three of them are active and up.
> One collection has 3 shards, each shard has 2 replicas.
>
> Everytime query whether from solrj or admin ui, getting inconsistent
> results. e.g.
> 1. numFound is always fluctuating.
> 2. facet count shows the count for a field, filter query on that field
> gets 0 results.
> 3. luke requests work(not sure whether gives correct info of all the
> dynamic field) on per shard not on collection when invoked from curl but
> doesnt work when called from solrj.
> 4. admin ui shows expanded results, same query goes from solrj,
> getExpandedResults() gives 0 docs.
>
> What would be cause of all this ? Any pointer to look for an error
> anything in the logs.
>


Re: solr error

2016-08-02 Thread GW
K, After installing the latest PECL for Solr in PHP I found that it fails
with Solr 6.1.0. This is Debian 8.

Anyway, I just ended up writing a function using curl to post JSON to Solr.
This is working with PHP5.6 & The latest Solrcloud

No more Solr client in PHP for me lol.

Parallel SQL is a reality in 6

Simple function as follows

http://".$solrserver.":8983/solr/".$collection."/update/json/docs?commit=true;;



$ch =
curl_init($solrPOSTurl);


curl_setopt($ch, CURLOPT_CUSTOMREQUEST,
"POST");

curl_setopt($ch, CURLOPT_POSTFIELDS,
$jsondata);

curl_setopt($ch, CURLOPT_RETURNTRANSFER,
true);
curl_setopt($ch, CURLOPT_HTTPHEADER,
array(

'Content-Type:
application/json',

'Content-Length: ' .
strlen($jsondata))

);



$result = curl_exec($ch);

return $result;

}


///
End sitck in include


 in your page

$data = array("id" => "55i", "name" => " WTF Hagrid");
$json_data_string = json_encode($data);


JSONpost($solrserver, $collection, $json_data_string);


?>



On 2 August 2016 at 08:57, GW <thegeofo...@gmail.com> wrote:

> best to get a build document together to ensure your server is correct.
>
> testing with a simple curl get/post
>
> I use PHP and Perl all the time and have to say the overall docs suck
> because the technology changes fast. The coolest thing about Solrcloud is
> it changes fast. For instance apt-get php5-solr on Ubuntu/Debian will give
> you a very old client. It's so old it is a total waste of time. The Pear
> libs are where you need to be. I'm trying to use Solrcloud 6.1.0 and what
> used to work for 6.0.1
>
>
> So it would appear that a strong ability with REST clients would be the
> answer.
>
> With my current task I have many JSON data sources and a JSON capable
> repository (Solr). If I use the SolrClient to read/post to the REST api,
> the data is read into variables and then moved to another set and posted.
>
> With Curl, I can take the JSON string from the server on the left and post
> it directly to the server on the right.
>
> I'm just refining the functions. Will send them to you shortly.
>
> At the end of the day, a good knowledge of REST apis is where everything
> happens in PHP. My current problem seems to be with the latest Pecl
> SolrClient and latest Solrcloud so I am reverting to posting with Curl.
> I've been doing my gets with Curl because I had a similar issue 5-6 months
> ago.
>
> I'll post those functions in a hour or so.
>
> Best,
>
> GW
>
> On 2 August 2016 at 01:46, Midas A <test.mi...@gmail.com> wrote:
>
>> Jürgen,
>> we are using Php solrclient  and getting above exception . what could be
>> the reason for the same  please elaborate.
>>
>> On Tue, Aug 2, 2016 at 11:10 AM, Midas A <test.mi...@gmail.com> wrote:
>>
>> > curl: (52) Empty reply from server
>> > what could be the case .and what should i do to minimize.
>> >
>> >
>> >
>> >
>> > On Tue, Aug 2, 2016 at 10:38 AM, Walter Underwood <
>> wun...@wunderwood.org>
>> > wrote:
>> >
>> >> I recommend you look at the PHP documentation to find out what “HTTP
>> >> Error 52” means.
>> >>
>> >> You can start by searching the web for this: php http error 52
>> >>
>> >> wunder
>> >> Walter Underwood
>> >> wun...@wunderwood.org
>> >> http://observer.wunderwood.org/  (my blog)
>> >>
>> >>
>> >> > On Aug 1, 2016, at 10:04 PM, Midas A <test.mi...@gmail.com> wrote:
>> >> >
>> >> > please reply .
>> >> >
>> >> > On Tue, Aug 2, 2016 at 10:24 AM, Midas A <test.mi...@gmail.com>
>> wrote:
>> >> >
>> >> >> Hi,
>> >> >>
>> >> >> i am connecting solr with php and getting *HTTP Error 52, and *HTTP
>> >> Error
>> >> >> 20* error *frequently .
>> >> >> what should i do to minimize these issues .
>> >> >>
>> >> >> Regards,
>> >> >> Abhishek T
>> >> >>
>> >> >>
>> >>
>> >>
>> >
>>
>
>


Re: solr error

2016-08-02 Thread GW
best to get a build document together to ensure your server is correct.

testing with a simple curl get/post

I use PHP and Perl all the time and have to say the overall docs suck
because the technology changes fast. The coolest thing about Solrcloud is
it changes fast. For instance apt-get php5-solr on Ubuntu/Debian will give
you a very old client. It's so old it is a total waste of time. The Pear
libs are where you need to be. I'm trying to use Solrcloud 6.1.0 and what
used to work for 6.0.1


So it would appear that a strong ability with REST clients would be the
answer.

With my current task I have many JSON data sources and a JSON capable
repository (Solr). If I use the SolrClient to read/post to the REST api,
the data is read into variables and then moved to another set and posted.

With Curl, I can take the JSON string from the server on the left and post
it directly to the server on the right.

I'm just refining the functions. Will send them to you shortly.

At the end of the day, a good knowledge of REST apis is where everything
happens in PHP. My current problem seems to be with the latest Pecl
SolrClient and latest Solrcloud so I am reverting to posting with Curl.
I've been doing my gets with Curl because I had a similar issue 5-6 months
ago.

I'll post those functions in a hour or so.

Best,

GW

On 2 August 2016 at 01:46, Midas A <test.mi...@gmail.com> wrote:

> Jürgen,
> we are using Php solrclient  and getting above exception . what could be
> the reason for the same  please elaborate.
>
> On Tue, Aug 2, 2016 at 11:10 AM, Midas A <test.mi...@gmail.com> wrote:
>
> > curl: (52) Empty reply from server
> > what could be the case .and what should i do to minimize.
> >
> >
> >
> >
> > On Tue, Aug 2, 2016 at 10:38 AM, Walter Underwood <wun...@wunderwood.org
> >
> > wrote:
> >
> >> I recommend you look at the PHP documentation to find out what “HTTP
> >> Error 52” means.
> >>
> >> You can start by searching the web for this: php http error 52
> >>
> >> wunder
> >> Walter Underwood
> >> wun...@wunderwood.org
> >> http://observer.wunderwood.org/  (my blog)
> >>
> >>
> >> > On Aug 1, 2016, at 10:04 PM, Midas A <test.mi...@gmail.com> wrote:
> >> >
> >> > please reply .
> >> >
> >> > On Tue, Aug 2, 2016 at 10:24 AM, Midas A <test.mi...@gmail.com>
> wrote:
> >> >
> >> >> Hi,
> >> >>
> >> >> i am connecting solr with php and getting *HTTP Error 52, and *HTTP
> >> Error
> >> >> 20* error *frequently .
> >> >> what should i do to minimize these issues .
> >> >>
> >> >> Regards,
> >> >> Abhishek T
> >> >>
> >> >>
> >>
> >>
> >
>


Apache/PHP/Perl round robin to Solrcloud question

2016-07-31 Thread GW
I'm a mostly a developer and my systems work is all out of necessity so a
bit weak.

My current development happens on a single server running Apache/PHP/Perl
as well as Solrcloud in the Googlecloud.

Perl and PHP make REST calls to Two collections on the single server
Solrcloud 6.1 @ 127.0.0.1

My thoughts are I can make this a private cloud image and then add servers
if required.

So my question is what is the best way deploy my app server & solr.

Can I just leave my apps making reads from 127.0.0.1 and add each server to
the zone file as www.mydomain.com for round robin

My Solrcloud is populated by a custom spider written in Perl which I
imagine I will only post to the leader.

Am I going sideways?

Many thanks,

GW


Re: Recommended api/lib to search Solr using PHP

2016-05-30 Thread GW
I would say look at the urls for searches you build in the query tool

In my case

http://172.16.0.1:8983/solr/#/products/query

When you build queries with the Query tool, for example an edismax query,
the URL is there for you to copy.
Use the url structure with curl in your programming/scripting. The results
come back as REST data.

This is what I do with PHP and it's pretty tight.


On 30 May 2016 at 02:29, scott.chu  wrote:

>
> We have two legacy in-house applications written in PHP 5.2.6 and 5.5.3.
> Our engineers currently just use fopen with url to search Solr but it's
> kinda unenough when we want to do more advanced, complex queries. We've
> tried to use something called 'Solarium' but its installtion steps has
> something to do with symphony, which is kinda complicated. We can't get the
> installation done ok. I'd like to know if there are some other
> better-structured PHP libraries or APIs?
>
> Note: Solr is 5.4.1.
>
> scott.chu,scott@udngroup.com
> 2016/5/30 (週一)
>


Re: Is there an equivalent to an SQL "select distinct" in Solr

2016-05-13 Thread GW
Thank you Shawn,

I will toy with these over the weekend. Solr/Hadoop/Hbase has been a nasty
learning curve for me,
It would probably would have been a lot easier if I didn't have 30 years of
RDBMS stuck in my head.

Again,

Many thanks for your response.


On 13 May 2016 at 08:57, Shawn Heisey <apa...@elyograg.org> wrote:

> On 5/13/2016 6:48 AM, GW wrote:
> > Let's say I have 10,000 documents and there is a field named "category"
> and
> > lets say there are 200 categories but I do not know what they are.
> >
> > My question: Is there a query/filter that can pull a list of distinct
> > categories?
>
> Sounds like a job for faceting or grouping.  Which one of them to use
> will depend on exactly what you're trying to obtain in your results.
>
> https://cwiki.apache.org/confluence/display/solr/Faceting
> https://cwiki.apache.org/confluence/display/solr/Result+Grouping
>
> Thanks,
> Shawn
>
>


Is there an equivalent to an SQL "select distinct" in Solr

2016-05-13 Thread GW
Let's say I have 10,000 documents and there is a field named "category" and
lets say there are 200 categories but I do not know what they are.

My question: Is there a query/filter that can pull a list of distinct
categories?

Thanks in advance,

GW


Re: issues doing a spatial query

2016-04-29 Thread GW
I realise the world wrap thing. but it is correct ~ they are coordinates
taken from Google maps. I'd does not really matter though. I switched the
query to use geofilt and everything is fine.

Here's the kicker.

There is a post somewhere online that says you cannot use geofilt with
multivalued location_RPT. I lost months because I did not try it.

If i use geofilt with the coordinates in question (last in the multvalue)
with a distance of 1km I get a perfect result. In fact I can get a perfect
single direct hit on any of the values with geofilt + distance +
multivalued.


People that don't know what they are talking about should not post.

Many thanks for your response.

GW


On 29 April 2016 at 00:40, David Smiley <david.w.smi...@gmail.com> wrote:

> Hi.
> This makes sense to me.  The point 49.8,-97.1 is in your query box.  The
> box is lower-left to upper-right, so your box is actually an almost
> world-wrapping one grabbing all longitudes except  -93 to -92.  Maybe you
> mean to switch your left & right.
>
> On Sun, Apr 24, 2016 at 8:03 PM GW <thegeofo...@gmail.com> wrote:
>
> > I was not getting the results I expected so I started testing with the
> solr
> > webclient
> >
> > Maybe I don;t understand things.
> >
> > simple test query
> >
> > q=*:*=locations:[49,-92 TO 50,-93]
> >
> > I don't understand why I get a result set for longitude range -92 to -93
> > but should be zero results as far as I understand.
> >
> >
> >
> 
> >
> > {
> >   "responseHeader": {
> > "status": 0,
> > "QTime": 2,
> > "params": {
> >   "q": "*:*",
> >   "indent": "true",
> >   "fq": "locations:[49,-92 TO 50,-93]",
> >   "wt": "json",
> >   "_": "1461541195102"
> > }
> >   },
> >   "response": {
> > "numFound": 85,
> > "start": 0,
> > "docs": [
> >   {
> > "id": "data.spidersilk.co!337",
> > "entity_id": "337",
> > "type_id": "simple",
> > "gender": "Male",
> > "name": "Aviator Sunglasses",
> > "short_description": "A timeless accessory staple, the
> > unmistakable teardrop lenses of our Aviator sunglasses appeal to
> > everyone from suits to rock stars to citizens of the world.",
> > "description": "Gunmetal frame with crystal gradient
> > polycarbonate lenses in grey. ",
> > "size": "",
> > "color": "",
> > "zdomain": "magento.spidersilk.co",
> > "zurl":
> > "
> >
> http://magento.spidersilk.co/index.php/catalog/product/view/id/337/s/aviator-sunglasses/
> > ",
> > "main_image_url":
> > "
> >
> http://magento.spidersilk.co/media/catalog/product/cache/0/image/9df78eab33525d08d6e5fb8d27136e95/a/c/ace000a_1.jpg
> > ",
> > "keywords": "Eyewear  ",
> > "data_size": "851,564",
> > "category": "Eyewear",
> > "final_price_without_tax": "295,USD",
> > "image_url": [
> >   "
> > http://magento.spidersilk.co/media/catalog/product/a/c/ace000a_1.jpg;,
> >   "
> > http://magento.spidersilk.co/media/catalog/product/a/c/ace000b_1.jpg;
> > ],
> > "locations": [
> >   "37.4463603,-122.1591775",
> >   "42.5857514,-82.8873787",
> >   "41.6942622,-86.2697108",
> >   "49.8522263,-97.1390697"
> > ],
> > "_version_": 1532418847465799700
> >   },
> >
> >
> >
> > Thanks,
> >
> > GW
> >
> --
> Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
> LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
> http://www.solrenterprisesearchserver.com
>


Re: Set router.field in unit tests

2016-04-29 Thread GW
Not exactly suer what you mean but I think you are wanting to change your
schema.xml



to




restart solr


On 29 April 2016 at 06:04, Markus Jelsma  wrote:

> Hi - any hints to share?
>
> Thanks!
> Markus
>
>
>
> -Original message-
> > From:Markus Jelsma 
> > Sent: Thursday 28th April 2016 13:30
> > To: solr-user 
> > Subject: Set router.field in unit tests
> >
> > Hi - i'm working on a unit test that requires the cluster's router.field
> to be set to a field different than ID. But i can't find it?! How can i set
> router.field with AbstractFullDistribZkTestBase?
> >
> > Thanks!
> > Markus
> >
>


issues doing a spatial query

2016-04-24 Thread GW
I was not getting the results I expected so I started testing with the solr
webclient

Maybe I don;t understand things.

simple test query

q=*:*=locations:[49,-92 TO 50,-93]

I don't understand why I get a result set for longitude range -92 to -93
but should be zero results as far as I understand.



{
  "responseHeader": {
"status": 0,
"QTime": 2,
"params": {
  "q": "*:*",
  "indent": "true",
  "fq": "locations:[49,-92 TO 50,-93]",
  "wt": "json",
  "_": "1461541195102"
}
  },
  "response": {
"numFound": 85,
"start": 0,
"docs": [
  {
"id": "data.spidersilk.co!337",
"entity_id": "337",
"type_id": "simple",
"gender": "Male",
"name": "Aviator Sunglasses",
"short_description": "A timeless accessory staple, the
unmistakable teardrop lenses of our Aviator sunglasses appeal to
everyone from suits to rock stars to citizens of the world.",
"description": "Gunmetal frame with crystal gradient
polycarbonate lenses in grey. ",
"size": "",
"color": "",
"zdomain": "magento.spidersilk.co",
"zurl":
"http://magento.spidersilk.co/index.php/catalog/product/view/id/337/s/aviator-sunglasses/;,
"main_image_url":
"http://magento.spidersilk.co/media/catalog/product/cache/0/image/9df78eab33525d08d6e5fb8d27136e95/a/c/ace000a_1.jpg;,
"keywords": "Eyewear  ",
"data_size": "851,564",
"category": "Eyewear",
"final_price_without_tax": "295,USD",
"image_url": [
  
"http://magento.spidersilk.co/media/catalog/product/a/c/ace000a_1.jpg;,
  "http://magento.spidersilk.co/media/catalog/product/a/c/ace000b_1.jpg;
],
"locations": [
  "37.4463603,-122.1591775",
  "42.5857514,-82.8873787",
  "41.6942622,-86.2697108",
  "49.8522263,-97.1390697"
],
"_version_": 1532418847465799700
  },



Thanks,

GW


Re: need help with keyword spamming

2016-04-23 Thread GW
No. My project is retail based. I mean people putting in a slew of
irrelevant keywords in addition to relevant keywords in an attempt to get
hits on searches and hits outside of context.

I used a filter factory to remove duplicates.

On 23 April 2016 at 11:30, Doug Turnbull <
dturnb...@opensourceconnections.com> wrote:

> By keyword spamming, do you mean stuffing the same term over and over to
> game term frequency?
>
> If so You might want to try tuning BM25 similarity for your needs. It has a
> saturation point for term frequency.
>
>
> http://opensourceconnections.com/blog/2015/10/16/bm25-the-next-generation-of-lucene-relevation/
>
> You can also write your own similarity that sets a max for term frequency.
>
> I'd also consider figuring out if you can build a page rank like measure
> that can signal content trustworthiness. Spammer sites won't be linked to
> very heavily by trusted sites.
>
> If you just mean spamming like lots of unique keywords, length
> normalization was built just for this reason: to bias relevance toward less
> verbose and more specific matches
>
> Hope that helps
>
> Doug
> On Sat, Apr 23, 2016 at 10:02 AM GW <thegeofo...@gmail.com> wrote:
>
> > Hey all,
> >
> > I'm just finishing up a project and I'm hoping for some direction on
> > dealing with keyword spamming.
> >
> > I don't have any urgent issues. I can foresee some bumps in the road.
> >
> > I'm using a custom spider that pulls inventory data from several dozen
> > sources into a single doc schema. 1 record per item per location.
> >
> > Data from several sources have an existing keyword field. Some records
> > coming in have empty or null data for keywords.
> >
> > I concatenated my category and keyword data into the keyword field so I
> > would not have any empty keyword data to satisfy a query builder.
> >
> > I have a recommended keyword list I could use to count hits before I
> index.
> > It's a painful thought.
> >
> > I want to be able to detect people that are trying to do keyword
> spamming.
> >
> > So my question is: Is there some kind of FM that I'm not aware of?
> >
> > Thanks in advance,
> >
> > GW
> >
>


need help with keyword spamming

2016-04-23 Thread GW
Hey all,

I'm just finishing up a project and I'm hoping for some direction on
dealing with keyword spamming.

I don't have any urgent issues. I can foresee some bumps in the road.

I'm using a custom spider that pulls inventory data from several dozen
sources into a single doc schema. 1 record per item per location.

Data from several sources have an existing keyword field. Some records
coming in have empty or null data for keywords.

I concatenated my category and keyword data into the keyword field so I
would not have any empty keyword data to satisfy a query builder.

I have a recommended keyword list I could use to count hits before I index.
It's a painful thought.

I want to be able to detect people that are trying to do keyword spamming.

So my question is: Is there some kind of FM that I'm not aware of?

Thanks in advance,

GW


Re: FW: SolrCloud App Unit Testing

2016-03-19 Thread GW
I think the easiest way to write apps for Solr is with some kind of
programming language and the REST API. Don't bother with the PHP or Perl
modules. They are deprecated and beyond useless. just use the HTTP call
that you see in Solr Admin. Mind the URL encoding when putting together
your server calls.

I've used Perl and PHP with Curl to create Solr Apps

PHP:

function fetchContent($URL){
$ch = curl_init();
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_URL, $URL);
$data = curl_exec($ch);
$httpCode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
curl_close($ch);
if ($httpCode == "404") {
$data="nin";
}
return $data;
}


switch($filters){

case "":
$url = "http://localhost:8983/solr/products/query?q=name
:".$urlsearch."^20+OR+short_description:".$urlsearch."~6=13=".$start."=*,score=json";
break;

case "clothing":
$url = "http://localhost:8983/solr/products/query
?q=name:%22".$urlsearch."%22^20+OR
+short_description:%22".$urlsearch."%22~6=13=".$start."=*,score=json";
break;

case "beauty cosmetics":

$url = "http://localhost:8983/solr/products/query?q=name
:".$urlsearch."^20+OR+short_description:".$urlsearch."~6=13=".$start."=*,score=json";


break;


}


$my_data = fetchContent($url);


Data goes into the $my_data as a JSON string in this case.


/// your forward facing App can be in Apache round robin in
front of a Solr system. This gives you insane scalability in the client app
and the Solr service.


Hope that helps.

GW


On 17 March 2016 at 11:23, Madhire, Naveen <naveen.madh...@capitalone.com>
wrote:

>
> Hi,
>
> I am writing a Solr Application, can anyone please let me know how to Unit
> test the application?
>
> I see we have MiniSolrCloudCluster class available in Solr, but I am
> confused about how to use that for Unit testing.
>
> How should I create a embedded server for unit testing?
>
>
>
> Thanks,
> Naveen
> 
>
> The information contained in this e-mail is confidential and/or
> proprietary to Capital One and/or its affiliates and may only be used
> solely in performance of work or services for Capital One. The information
> transmitted herewith is intended only for use by the individual or entity
> to which it is addressed. If the reader of this message is not the intended
> recipient, you are hereby notified that any review, retransmission,
> dissemination, distribution, copying or other use of, or taking of any
> action in reliance upon this information is strictly prohibited. If you
> have received this communication in error, please contact the sender and
> delete the material from your computer.
>


Re: alternative forum for SOLR user

2016-02-01 Thread GW
I personally hate email lists..

But this one is actually pretty good. Excellent actually.

I'm a convert.

Joined it with Gogle mail, forward all to a folder and search it.

Piece of cake.


On 1 February 2016 at 11:08, Jean-Jacques MONOT  wrote:

> Thank you for the very quick answer : the mailing list is very efficient.
>
> The trouble with a mailing list is that I will receive a lot of message in
> my mail box  I will see if I unsubscribe ...
>
>
>   De : Binoy Dalal 
>  À : SOLR Users 
>  Envoyé le : Lundi 1 février 2016 9h30
>  Objet : Re: alternative forum for SOLR user
>
> This is the forum if you want help. There are additional forums for dev and
> other discussions.
> Check it out here: lucene.apache.org/solr/resources.html
>
> If you are looking for the archives just Google solr user list archive.
>
> On Mon, 1 Feb 2016, 13:43 Jean-Jacques MONOT  wrote:
>
> > Hello
> >
> > I am a newbie with SOLR and just registered to this mailing list.
> >
> > Is there an alternative forum for SOLR user ? I am using this mailing
> > list for support, but did not find "real" web forum.
> >
> > JJM
> >
> > ---
> > L'absence de virus dans ce courrier électronique a été vérifiée par le
> > logiciel antivirus Avast.
> > https://www.avast.com/antivirus
> >
> > --
> Regards,
> Binoy Dalal
>
>
>


Re: Manage schema.xml via Solrj?

2016-01-08 Thread GW
Bob,

Not sure why you would want to do this. You can set up Solr to guess the
schema. It creates a file called manage_schema.xml for an override. This is
the case with 5.3 I came across it by accident setting it up the first time
and I was a little annoyed but it made for a quick setup. Your programming
would still need to realise the new doc structure and use that new document
structure. The only problem is it's a bit generic in the guess work and I
did not spend much time testing it out so I am not really versed in
operating it. I got myself mack to schema.xml ASAP. My thoughts are you are
looking at a lot of work for little gain.

Best,

GW



On 7 January 2016 at 21:36, Bob Lawson <bwlawson...@gmail.com> wrote:

> I want to programmatically make changes to schema.xml using java to do
> it.  Should I use Solrj to do this or is there a better way?  Can I use
> Solrj to make the rest calls that make up the schema API?  Whatever the
> answer, can anyone point me to an example showing how to do it?  Thanks!
>
>


Re: Count multivalued field issue

2016-01-06 Thread GW
When dealing with Solr data you need to decide whether or not to go
programming.

when I want to count a multi-value I go programming.


$count = array_count($array);



On 6 January 2016 at 08:43, marotosg  wrote:

> Hi,
>
> I am trying to add a new field to my schema to add the number of items of a
> multivalued field.
> I am using solr 4.11
>
> These are my fields on *schema.xml*
>  multiValued="true" stored="true" />
> 
>
> Here is the update done to my *solrconfig.xml*. I created an
> updateRequestProcessorChain
> and add it to the update handler
>
> 
> 
> countfields
> 
> 
>
> 
>
>  EmailListS
>  EmailListCountD
>
>
>  EmailListCountD
>
>
>  EmailListCountD
>  0
>
>
>
>  
>
> Am I doing somwthing wrong here?
>
> Thanks for your help.
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Count-multivalued-field-issue-tp4248878.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Solr5.X document loss in splitting shards

2015-12-28 Thread GW
I don't use Curl but there are a couple of things that come to mind

1: Maybe use document routing with the shards. Use an "!" in your unique
ID. I'm using gmail to read this and it sucks for searching content so if
you have done this please ignore this point. Example: If you were storing
documents per domain you unique field values would look like
www.domain1.com!123,  www.domain1.com!124,
   www.domain2.com!35, etc.

This should create a two segment hash for searching shards. I do this in
blind faith as a best practice as it is mentioned in the docs.

2: Curl works best with URL encoding. I was using Curl at one time and I
noticed some strange results w/o url encoding

What are you using to write your client?

Best,

GW



On 27 December 2015 at 19:35, Shawn Heisey <apa...@elyograg.org> wrote:

> On 12/26/2015 11:21 AM, Luca Quarello wrote:
> > I have a SOLR 5.3.1 CLOUD with two nodes and 8 shards per node.
> >
> > Each shard is about* 35 million documents (**35025882**) and 16GB sized.*
> >
> >
> >- I launch the SPLIT command on a shard (shard 13) in the ASYNC way:
>
> 
>
> > The new created shards have:
> > *13430316 documents (5.6 GB) and 13425924 documents (5.59 GB**)*.
>
> Where are you looking that shows you the source shard has 35 million
> documents?  Be extremely specific.
>
> The following screenshot shows one place you might be looking for this
> information -- the core overview page:
>
> https://www.dropbox.com/s/311n49wkp9kw7xa/admin-ui-core-overview.png?dl=0
>
> Is the core overview page where you are looking, or is it somewhere else?
>
> I'm asking because "Max Doc" and "Num Docs" on the core overview page
> mean very different things.  The difference between them is the number
> of deleted docs, and the split shards are probably missing those deleted
> docs.
>
> This is the only idea that I have.  If it's not that, then I'm as
> clueless as you are.
>
> Thanks,
> Shawn
>
>


Re: AJAX access to Solr Server

2015-12-26 Thread GW
Yes, your proxy seems to work.

The only thing that bothers me is anyone can query your Solr installation.

The world is not a nice place and I can't tell you how many DOS attacks
I've fended off in the last 30 years.

If I thought you were an a-hole I could set up a few machines and query
your server to a standstill.

About ten years ago I was working on a contract. The competitor that lost
the bid did a email DOS attack on me after they took out a whole bunch car
adds (hot deals) in the local paper.  My email was f###'d and my phone was
ringing off the hook.

Cheers,

GW


On 25 December 2015 at 21:55, Doug Turnbull <
dturnb...@opensourceconnections.com> wrote:

> Yeah I prefer a whitelist of locked down query request handlers via a
> proxy that are reasonably well protected. I would never expose update to
> the web or allow any updating over a public interface.
>
> If you want an example, you can checkout
>
>
> http://solr.quepid.com/solr/statedecoded/select?q=*:*=update=
> *:*=true
>
> http://solr.quepid.com/solr/statedecoded/update?stream.body=
> *:*=true
>
> But still get search results back:
> http://solr.quepid.com/solr/statedecoded/select?q=*:*
>
> Click all those all day long. And do let me know if you find holes... I'm
> sure there's room for improvement
>
> Cheers,
> -Doug
>
> On Friday, December 25, 2015, GW <thegeofo...@gmail.com> wrote:
>
> > If you are using Linux a simple one liner in IP tables
> >
> > iptables -I INPUT \! --src www.yourwebserver.com -m tcp -p tcp --dport
> > 8983 -j DROP
> >
> >
> > If windows, you can do something similar
> >
> > otherwise it is very easy for anyone to delete all your documents with
> >
> > http://yoursolrserver.com:8983/solr/your-core/update?stream.body=
> > *:*=true
> >
> >
> >
> >
> > On 25 December 2015 at 20:42, Doug Turnbull <
> > dturnb...@opensourceconnections.com <javascript:;>> wrote:
> >
> > > Hi Shawn
> > >
> > > Maybe I should have qualified the parameters of scenarios this make me
> > > comfortable just proxying Solr directly w/o an API
> > >
> > > These situations include:
> > >
> > > 1. I've got no qualms about giving the whole world access to every
> > document
> > > in the index. There's nothing protected about anything.
> > > 2. The content can be easily rebuilt , it's not very large. (I can
> easily
> > > push a button and make a new one)
> > >
> > > Sure you can denial of service Solr, and I might lose my search index.
> > But
> > > you can denial of service anything. This includes just about anything
> you
> > > put in front of Solr. Moreover, the added complexity of a
> > > Drupal/Wordpress/your API might only add to the security problems with
> > > their own security issues. I'd rather keep it simple and have fewer
> > moving
> > > parts.
> > >
> > > Cases where I would want an API in front of Solr (these are just the
> > > security ones):
> > > - I want to protect the content (ie based on some notion of a "user" or
> > > other permissions)
> > > - Rebuilding the content would be very hard and time consuming
> > >
> > > I would also say to expose Solr directly to everyone you probably
> should
> > > know about Solr's bugaboos:
> > > - the lovely qt parameter and the request dispatcher (the nginx proxy
> > below
> > > disallows qt)
> > > - deep paging (prevented by the nginx proxy)
> > > - how to lock down a request handler fairly robustly, how to use
> > invariants
> > > - mitigating intentionally malicious queries (such as the lovely
> "sleep"
> > > function query).
> > >
> > > I'm also curious to hear what the websolr people do, or anyone else
> that
> > > hosts Solr for the JavaScript app development crowd.
> > >
> > > Cheers
> > > -Doug
> > >
> > >
> > > On Friday, December 25, 2015, Shawn Heisey <apa...@elyograg.org
> > <javascript:;>> wrote:
> > >
> > > > On 12/25/2015 12:17 PM, Eric Dain wrote:
> > > > > Does allowing javascript direct access to SolrCloud raise security
> > > > concern?
> > > > > should I build a REST service in between?
> > > > >
> > > > > I need to provide async search capability to web pages. the pages
> > will
> > > be
> > > > > public with no authentication.
> > > >
> > > > End users should neve

Re: AJAX access to Solr Server

2015-12-26 Thread GW
What are you using for a client?

I generally use a REST client written in PHP or Perl and then prevent cross
scripting so only the client can do the work.

My Solr cluster is running behind OpenVPN on 172.16.0.0/24

I use a jquery in the following to get an infinite scroll

http://www.frogshopping.com

cross scripting work not in place yet.

On 26 December 2015 at 09:59, Doug Turnbull <
dturnb...@opensourceconnections.com> wrote:

> True though you could also query an API in front of Solr to a stand still
> pretty easily.  DoSing is a pretty easy thing to do to anything that needs
> to be open to the public.
>
> The biggest issue with the proxy approach is an attacker with Solr
> knowledge that doesn't need to DoS, just send a handful of really slow
> queries to Solr. This is something that can be mitigated, but the more
> skilled the attacker the more interesting the slow queries get.
>
> I should note this is a problem to mitigate with any system that handles
> user queries by passing then directly to edismax. Solr sort of encourages
> you to talk straight to edismax, and most systems don't prepare or escape
> the query. Instead they want to support the full range of query operations.
> An attacker can still put nasty function queries in the query box enough
> times to make a Solr server crawl.
>
> Doug
>
> On Saturday, December 26, 2015, GW <thegeofo...@gmail.com> wrote:
>
> > Yes, your proxy seems to work.
> >
> > The only thing that bothers me is anyone can query your Solr
> installation.
> >
> > The world is not a nice place and I can't tell you how many DOS attacks
> > I've fended off in the last 30 years.
> >
> > If I thought you were an a-hole I could set up a few machines and query
> > your server to a standstill.
> >
> > About ten years ago I was working on a contract. The competitor that lost
> > the bid did a email DOS attack on me after they took out a whole bunch
> car
> > adds (hot deals) in the local paper.  My email was f###'d and my phone
> was
> > ringing off the hook.
> >
> > Cheers,
> >
> > GW
> >
> >
> > On 25 December 2015 at 21:55, Doug Turnbull <
> > dturnb...@opensourceconnections.com <javascript:;>> wrote:
> >
> > > Yeah I prefer a whitelist of locked down query request handlers via a
> > > proxy that are reasonably well protected. I would never expose update
> to
> > > the web or allow any updating over a public interface.
> > >
> > > If you want an example, you can checkout
> > >
> > >
> > >
> >
> http://solr.quepid.com/solr/statedecoded/select?q=*:*=update=
> > > *:*=true
> > >
> > > http://solr.quepid.com/solr/statedecoded/update?stream.body=
> > > *:*=true
> > >
> > > But still get search results back:
> > > http://solr.quepid.com/solr/statedecoded/select?q=*:*
> > >
> > > Click all those all day long. And do let me know if you find holes...
> I'm
> > > sure there's room for improvement
> > >
> > > Cheers,
> > > -Doug
> > >
> > > On Friday, December 25, 2015, GW <thegeofo...@gmail.com
> <javascript:;>>
> > wrote:
> > >
> > > > If you are using Linux a simple one liner in IP tables
> > > >
> > > > iptables -I INPUT \! --src www.yourwebserver.com -m tcp -p tcp
> --dport
> > > > 8983 -j DROP
> > > >
> > > >
> > > > If windows, you can do something similar
> > > >
> > > > otherwise it is very easy for anyone to delete all your documents
> with
> > > >
> > > > http://yoursolrserver.com:8983/solr/your-core/update?stream.body=
> > > > *:*=true
> > > >
> > > >
> > > >
> > > >
> > > > On 25 December 2015 at 20:42, Doug Turnbull <
> > > > dturnb...@opensourceconnections.com <javascript:;> <javascript:;>>
> > wrote:
> > > >
> > > > > Hi Shawn
> > > > >
> > > > > Maybe I should have qualified the parameters of scenarios this make
> > me
> > > > > comfortable just proxying Solr directly w/o an API
> > > > >
> > > > > These situations include:
> > > > >
> > > > > 1. I've got no qualms about giving the whole world access to every
> > > > document
> > > > > in the index. There's nothing protected about anything.
> > > > > 2. The content can be easily rebuilt , it's not very large. (I can
&g

Re: AJAX access to Solr Server

2015-12-25 Thread GW
If you are using Linux a simple one liner in IP tables

iptables -I INPUT \! --src www.yourwebserver.com -m tcp -p tcp --dport
8983 -j DROP


If windows, you can do something similar

otherwise it is very easy for anyone to delete all your documents with

http://yoursolrserver.com:8983/solr/your-core/update?stream.body=
*:*=true




On 25 December 2015 at 20:42, Doug Turnbull <
dturnb...@opensourceconnections.com> wrote:

> Hi Shawn
>
> Maybe I should have qualified the parameters of scenarios this make me
> comfortable just proxying Solr directly w/o an API
>
> These situations include:
>
> 1. I've got no qualms about giving the whole world access to every document
> in the index. There's nothing protected about anything.
> 2. The content can be easily rebuilt , it's not very large. (I can easily
> push a button and make a new one)
>
> Sure you can denial of service Solr, and I might lose my search index. But
> you can denial of service anything. This includes just about anything you
> put in front of Solr. Moreover, the added complexity of a
> Drupal/Wordpress/your API might only add to the security problems with
> their own security issues. I'd rather keep it simple and have fewer moving
> parts.
>
> Cases where I would want an API in front of Solr (these are just the
> security ones):
> - I want to protect the content (ie based on some notion of a "user" or
> other permissions)
> - Rebuilding the content would be very hard and time consuming
>
> I would also say to expose Solr directly to everyone you probably should
> know about Solr's bugaboos:
> - the lovely qt parameter and the request dispatcher (the nginx proxy below
> disallows qt)
> - deep paging (prevented by the nginx proxy)
> - how to lock down a request handler fairly robustly, how to use invariants
> - mitigating intentionally malicious queries (such as the lovely "sleep"
> function query).
>
> I'm also curious to hear what the websolr people do, or anyone else that
> hosts Solr for the JavaScript app development crowd.
>
> Cheers
> -Doug
>
>
> On Friday, December 25, 2015, Shawn Heisey  wrote:
>
> > On 12/25/2015 12:17 PM, Eric Dain wrote:
> > > Does allowing javascript direct access to SolrCloud raise security
> > concern?
> > > should I build a REST service in between?
> > >
> > > I need to provide async search capability to web pages. the pages will
> be
> > > public with no authentication.
> >
> > End users should never have access to Solr.  Access to Solr from the
> > end-user machine is required if you want to accept Solr responses
> directly.
> >
> > In one of the other replies that you received, Doug has given you an
> > nginx config for proxying access to Solr -- indirect access.  This can
> > protect against *changes* to the index, and it has protection against
> > high start/rows values, but there are many other ways that an attacker
> > can construct denial of service queries, which this proxy config will
> > not prevent.
> >
> > I think that indirect access (through a proxy) should not be allowed
> > either, unless you can trust all the people that will have access.
> >
> > If Solr is open to a sufficiently wide audience (especially the
> > Internet), someone will find a way to abuse the service even with a
> > proxy, either to cause harm or to learn things they shouldn't know.
> >
> > The most secure option is to only allow the webservers and trusted
> > administrators to access Solr.  All end user (Internet) access to Solr
> > should be handled through a custom web application.  This might be
> > something that you find and install (such as wordpress, drupal, etc), or
> > one that you write yourself.
> >
> > You can still do AJAX while maintaining security.  You'll need to write
> > something in a server-side web programming language like PHP, Java, etc.
> >  This code will need to accept the AJAX requests from your client-side
> > javascript code, validate the request parameters to make sure they're
> > sane, get a response from Solr, and return relevant data.  If the
> > parameters don't validate, return an error, and handle that error
> > appropriately in the javascript code.
> >
> > Thanks,
> > Shawn
> >
> >
>
> --
> *Doug Turnbull **| *Search Relevance Consultant | OpenSource Connections
> , LLC | 240.476.9983
> Author: Relevant Search 
> This e-mail and all contents, including attachments, is considered to be
> Company Confidential unless explicitly stated otherwise, regardless
> of whether attachments are marked as such.
>


Re: AJAX access to Solr Server

2015-12-25 Thread GW
I would put in a basic iptables statement to allow only your webserver to
prevent

http://172.16.0.22:8983/solr/products/update?stream.body=%3Cdelete%3E%3Cquery%3E*:*%3C/query%3E%3C/delete%3E=true

On 25 December 2015 at 14:58, Eric Dain  wrote:

> Thanks, that is very helpful.
>
> Have you tried denying access to some fields in the documents?
>
> On Fri, Dec 25, 2015 at 11:31 AM, Doug Turnbull <
> dturnb...@opensourceconnections.com> wrote:
>
> > We do this all the time, whitelisting only the readonly search end points
> > we want to support and disallowing excessively large paging.
> >
> > Here is a template for an nginx solr proxy. The read me describes more of
> > our philosophy
> >
> > https://github.com/o19s/solr_nginx
> >
> > On Friday, December 25, 2015, Eric Dain  wrote:
> >
> > > Hi all,
> > >
> > > Does allowing javascript direct access to SolrCloud raise security
> > concern?
> > > should I build a REST service in between?
> > >
> > > I need to provide async search capability to web pages. the pages will
> be
> > > public with no authentication.
> > >
> > > Happy searching,
> > > Eric
> > >
> >
> >
> > --
> > *Doug Turnbull **| *Search Relevance Consultant | OpenSource Connections
> > , LLC | 240.476.9983
> > Author: Relevant Search 
> > This e-mail and all contents, including attachments, is considered to be
> > Company Confidential unless explicitly stated otherwise, regardless
> > of whether attachments are marked as such.
> >
>


Newb from the iron age on a mission:Solr deployment

2015-11-24 Thread GW
I hope I am in the context of this mailing list,

Thanks in advance.

A little background

I learned computers with 6800 machine assembly. With decades of RDBMS
jumping into the Solr/Hadoop/Hbase is still a pilgrimage through hell. I
think I didn't need to learn hadoop or hbase.

So, I have a personal project that I think could go viral. This means I
have boatloads of uncertainties to deal with because there seems to be no
real guidelines on scaling and hardware selection. I cannot just pull out
my calculator. Ironically I have a similar problem in RDBs. The calculator
shows I am bordering on stupidity. Seeing as how I am also spending other
peoples money as well as my own you can guess I am nervous.

I will try to speak Apache now.

I have 4 cores that are completely flat, ie; I will never join them. Two
will cores almost never change. One will be large. Extremely large.

This large core has a schema of 11 fields 7 of which I need to store. It
seems crazy to offload this to hbase. Am I crazy? These indexes will be
updated weekly and regularly. Daily.

As it stands I plan to deploy a dedicated Zookeeper ensemble of three
servers that scale vertically to insanity but minimal HW config, single
quad Xeon processor on a dual socket board 2 16G strips. Intel SSD

I'm planning 5 quadcore Solr boxes all Intel SSD drives.

>From what I have read Intel makes the only SSD drive that supports caching
in RAID 0 and some people say they're happening. So I'm thinking 5 and
alive. 3 servers on JBODs and two on RAID 0.

I'm having a tough time not doing VMware on this. It's funny when I reflect
on going to Xen, KVM, VMware. Oh well, knowledge be damned, I'm back in the
iron age lol. These indexes do not need to be backed up so why should I
care. I only need to worry about my crawler's DB so I'm not worried for
backups.  It's all in my comfort zone.

Ah! So SolrCloud is a petri dish! I hear people yelling jump in the
water's fine! I still want to VM. Am I better off in the land of tarballs
and configs? Can I use Linux volume manager?

My primary concern is should I use hbase. Everything is screaming no at me.
It just looks like a useless abstraction to me. I think I am a pure Solr
project after grasping Hadoop/Hbase to some degree. At the end of the day I
know $h1t3. I'm only one guy so if I can skip Hadoop/Hbase management I
will. I think I fit the criteria for just being a search engine.

I almost wish I never did the Solr/Nutch/Hbase tutorial.

Any criticism or comments appreciated.

:-)