Re: Question on index time de-duplication

2015-10-29 Thread Zheng Lin Edwin Yeo
Yes, you can try to use the SignatureUpdateProcessorFactory to do a hashing
of the content to a signature field, and group the signature field during
your search.

You can find more information here:
https://cwiki.apache.org/confluence/display/solr/De-Duplication

I have been using this method to group the index with duplicated content,
and it is working fine.

Regards,
Edwin


On 30 October 2015 at 07:20, Shamik Bandopadhyay  wrote:

> Hi,
>
>   I'm looking to customizing index time de-duplication. Here's my use case
> and what I'm trying to achieve.
>
> I've identical documents coming from different release year of a given
> product. I need to index them in Solr as they are required in individual
> year context. But there's a generic search which spans across all the years
> and hence bring back duplicate/identical content. My goal is to only return
> the latest document and filter out the rest. For e.g. if product A has
> identical documents for 2015, 2014 and 2013, search should only return 2015
> (latest document) and filter out the rest.
>
> What I'm thinking (if possible) during index time :
>
> Index all documents, but add a special tag (e.g. dedup=true) to 2013 and
> 2014 content, keeping 2015 (the latest release) untouched. During query
> time, I'll add a filter which will exclude contents tagged with "dedup".
>
> Just wondering if this is achievable by perhaps extending
> UpdateRequestProcessorFactory or
> customizing SignatureUpdateProcessorFactory ?
>
> Any pointers will be appreciated.
>
> Regards,
> Shamik
>


Re: Solr 5.3.1 CREATE defaults to schema-less mode Java version 1.7.0_45

2015-10-29 Thread Erick Erickson
I'm pretty confused about what you're trying to do. You mention using
the SolrCloud UI to look at your core, but on the other hand you also
mention using the core admin to create the core.

Trying to use the core admin commands with SolrCloud is a recipe for
disaster. Under the covers, the _collections_ api does, indeed, use
the core admin API to create cores, but it really must be precisely
done. If you're going to try to create your own cores, I recommend
setting up a non-SolrCloud system.

If you want to use SolrCloud, then I _strongly_ recommend you use the
collections API to create your collections. You can certainly have a
single-shard collection that would be a leader-only collection (i.e.
no followers), which would have only a single core cluster-wide if
that fits your architecture

As it is, in cloud mode Solr expects the configs to be up on
Zookeeper, not resident on disk somewhere. And the admin core create
command promises that you have the configs in
/Users/nw/Downloads/twc-session-dash/collection1 which is a recipe for
confusion on Solr's part...

HTH,
Erick

On Thu, Oct 29, 2015 at 4:24 PM, natasha  wrote:
> Note, if I attempt to CREATE the core using Solr 5.3.0 on my openstack
> machine (Java version 1.7.0) I have no issues.
>
>
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Solr-5-3-1-CREATE-defaults-to-schema-less-mode-Java-version-1-7-0-45-tp4237305p4237307.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Problem with the Content Field during Solr Indexing

2015-10-29 Thread Zheng Lin Edwin Yeo
The "\n" actually means new line as decoded by Solr from the indexed
document.

What is your file extension of your image file, and which method are you
using to do the indexing?

Regards,
Edwin


On 30 October 2015 at 04:38, Shruti Mundra  wrote:

> Hi,
>
> When I'm trying index an image file directly to Solr, the attribute
> content, consists of trails of "\n"s and not the data.
> We are successful in getting the metadata for that image.
>
> Can anyone help us out on how we could get the content along with the
> Metadata.
>
> Thanks!
>
> - Shruti Mundra
>


Re: Solr 5.3.1 CREATE defaults to schema-less mode Java version 1.7.0_45

2015-10-29 Thread natasha
Note, if I attempt to CREATE the core using Solr 5.3.0 on my openstack
machine (Java version 1.7.0) I have no issues.





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-5-3-1-CREATE-defaults-to-schema-less-mode-Java-version-1-7-0-45-tp4237305p4237307.html
Sent from the Solr - User mailing list archive at Nabble.com.


Question on index time de-duplication

2015-10-29 Thread Shamik Bandopadhyay
Hi,

  I'm looking to customizing index time de-duplication. Here's my use case
and what I'm trying to achieve.

I've identical documents coming from different release year of a given
product. I need to index them in Solr as they are required in individual
year context. But there's a generic search which spans across all the years
and hence bring back duplicate/identical content. My goal is to only return
the latest document and filter out the rest. For e.g. if product A has
identical documents for 2015, 2014 and 2013, search should only return 2015
(latest document) and filter out the rest.

What I'm thinking (if possible) during index time :

Index all documents, but add a special tag (e.g. dedup=true) to 2013 and
2014 content, keeping 2015 (the latest release) untouched. During query
time, I'll add a filter which will exclude contents tagged with "dedup".

Just wondering if this is achievable by perhaps extending
UpdateRequestProcessorFactory or
customizing SignatureUpdateProcessorFactory ?

Any pointers will be appreciated.

Regards,
Shamik


Solr 5.3.1 CREATE defaults to schema-less mode Java version 1.7.0_45

2015-10-29 Thread natasha
Hi,

I just downloaded Solr 5.3.1, and after starting Solr with the following
command:

bin/solr start -p 8985

I attempt to CREATE a Solr-core with the following command:

curl
'http:localhost:8985/solr/admin/cores?action=CREATE&name=test-core&instanceDir=/Users/nw/Downloads/twc-session-dash/collection1'

where twc-session-dash directory has

  
schema.xml
solrconfig.xml
  
 
   _2.fdt
   _2.fnm
   _2_Lucene41_0.doc
   _2_Lucene41_0.tip
   _2.fdx   
   _2.si
   _2_Lucene41_0.tim
   segments_4

solrconfig.xml has the following content:
1024100012147483647truefalsetermsLUCENE_41

And see schema.xml at bottom of post:

And the core is ostensibly created without an issue (numDocs is 59, which is
correct and no error logged) But, when I ping the newly created core's
schema via:

curl 'http://localhost:8985/solr/test-core/schema/fields'

Only id, _root_, _text_, and _version_ are returned. When I query a known
field in my schema.xml like _id, I am told "undefined field _id."

The changes to the directory include that Solr 5.3.1 has populated a
core-properties file at the same level as conf and data with content:
#Written by CorePropertiesLocator
#Thu Oct 29 22:53:57 UTC 2015
name=test-core
coreNodeName=core_node1

Added tlog/ at the same level as index, and write.lock within index.

I investigate the Solr Cloud UI, and note that my schema fields as well as
the default fields are mixed together in the Schema Browser dropdown for
test-core. However, the only fields available in the Analyse Fieldname /
FieldType: dropdown in the Analysis tab are the default fields. 

I am running Java version 1.7.0_51 on this machine, which is an openstack
machine.

When I install Solr 5.3.1 on my local machine, which is running Java version
"1.8.0_45" I have no issue. I am able to CREATE the core, I see my and only
my fields when I query the schema, and am able to query these fields no
problem.

Please advise how I can CREATE solr indices on my openstack machine. Thank
you,

_id_id



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-5-3-1-CREATE-defaults-to-schema-less-mode-Java-version-1-7-0-45-tp4237305.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr collection alias - how rank is affected

2015-10-29 Thread Ronald Xiao
Using global IDF, if data is not even

On Tuesday, October 27, 2015, Markus Jelsma 
wrote:

> Hello - regarding fairly random/smooth distribution, you will notice it
> for sure. A solution there is to use distributed collection statistics. On
> top of that you might want to rely on docCount, not maxDoc inside your
> similarity implementation, because docCount should be identical in both
> collections. maxDoc is not really deterministic it seems, since identical
> replica's do not merge segments at the same time.
>
> Markus
>
>
> -Original message-
> > From:Scott Stults >
> > Sent: Tuesday 27th October 2015 21:18
> > To: solr-user@lucene.apache.org 
> > Subject: Re: Solr collection alias - how rank is affected
> >
> > Collection statistics aren't shared between collections, so there's going
> > to be a difference. However, if the distribution is fairly random you
> won't
> > notice.
> >
> > On Tue, Oct 27, 2015 at 3:21 PM, SolrUser1543  > wrote:
> >
> > > How is document ranking is affected when using a collection alias for
> > > searching on two collections with same schema ? is it affected at all
> ?
> > >
> > >
> > >
> > > --
> > > View this message in context:
> > >
> http://lucene.472066.n3.nabble.com/Solr-collection-alias-how-rank-is-affected-tp4236776.html
> > > Sent from the Solr - User mailing list archive at Nabble.com.
> > >
> >
> >
> >
> > --
> > Scott Stults | Founder & Solutions Architect | OpenSource Connections,
> LLC
> > | 434.409.2780
> > http://www.opensourceconnections.com
> >
>


Re: restore quorum after majority of zk nodes down

2015-10-29 Thread Pushkar Raste
How about having let's say 4 nodes on each side and make one node in one of
data centers a observer. When data center with majority of the nodes go
down, bounce the observer by reconfiguring it as a voting member.

You will have to revert back the observer back to being one.

There will be a short outage as far as indexing is concerned but queries
should continue to work and you don't have to take all the zookeeper nodes
down.

-- Pushkar Raste
On Oct 29, 2015 4:33 PM, "Matteo Grolla"  wrote:

> Hi Walter,
>   it's not a problem to take down zk for a short (1h) time and
> reconfigure it. Meanwhile solr would go in readonly mode.
> I'd like feedback on the fastest way to do this. Would it work to just
> reconfigure the cluster with other 2 empty zk nodes? Would they correctly
> sync from the nonempty one? Should first copy data from zk3 to the two
> empty zk?
> Matteo
>
>
> 2015-10-29 18:34 GMT+01:00 Walter Underwood :
>
> > You can't. Zookeeper needs a majority. One node is not a majority of a
> > three node ensemble.
> >
> > There is no way to split a Solr Cloud cluster across two datacenters and
> > have high availability. You can do that with three datacenters.
> >
> > You can probably bring up a new Zookeeper ensemble and configure the Solr
> > cluster to talk to it.
> >
> > wunder
> > Walter Underwood
> > wun...@wunderwood.org
> > http://observer.wunderwood.org/  (my blog)
> >
> >
> > > On Oct 29, 2015, at 10:08 AM, Matteo Grolla 
> > wrote:
> > >
> > > I'm designing a solr cloud installation where nodes from a single
> cluster
> > > are distributed on 2 datacenters which are close and very well
> connected.
> > > let's say that zk nodes zk1, zk2 are on DC1 and zk2 is on DC2 and let's
> > say
> > > that DC1 goes down and the cluster is left with zk3.
> > > how can I restore a zk quorum from this situation?
> > >
> > > thanks
> >
> >
>


Re: Closing Windows CMD kills Solr

2015-10-29 Thread Timothy Potter
would launching the java process with javaw help here?

On Thu, Oct 29, 2015 at 4:03 AM, Zheng Lin Edwin Yeo
 wrote:
> Yes, this is the expected behaviour. Once you close the command window,
> Solr will stop running. This has happened to me several times. Just to
> check, which version of Solr are you using?
>
> I have tried NSSM before, and it works for Solr 5.0 and Solr 5.1. However,
> when I move up to Solr 5.3.0, I wasn't able to use the same method to start
> it as a service using NSSM.
>
> Do let me know if you managed to set up NSSM for Solr 5.3.0?
>
> Regards,
> Edwin
>
>
> On 29 October 2015 at 17:22, Charlie Hull  wrote:
>
>> On 29/10/2015 00:12, Steven White wrote:
>>
>>> Hi Folks,
>>>
>>> I don't understand if this is an expected behavior or not.
>>>
>>> On Windows, I start Solr from a command prompt like so:
>>>
>>>  bin\solr start -p 8983 -s C:\MySolrIndex
>>>
>>> Now, once I close the command prompt the Java process that started Solr is
>>> killed.  is this expected?  How do I keep Solr alive when I close the
>>> command prompt?
>>>
>>> This dose not happen on Linux.
>>>
>>> Thank in advanced.
>>>
>>> Steve
>>>
>>> Yes, this is expected behaviour. If you want to prevent it you either
>> need to keep the command window open, or run Solr as a Windows Service
>> (kinda like a Unix daemon, but more annoying). To do the latter I would
>> recommend NSSM - here are some links that will help:
>>
>> http://mikerobbins.co.uk/2014/11/07/install-solr-as-windows-service-for-sitecore-content-search/
>> (from step 2)
>> http://www.norconex.com/how-to-run-solr5-as-a-service-on-windows/
>>
>> HTH
>>
>> Charlie
>>
>>
>>
>> --
>> Charlie Hull
>> Flax - Open Source Enterprise Search
>>
>> tel/fax: +44 (0)8700 118334
>> mobile:  +44 (0)7767 825828
>> web: www.flax.co.uk
>>


Using geotopic parser

2015-10-29 Thread Salonee Rege
We are using the geotopic parser on html pages. Does the geotopc parser
only take .geot files. Kindly help
*Salonee Rege*
USC Viterbi School of Engineering
University of Southern California
Master of Computer Science - Student
Computer Science - B.E
salon...@usc.edu  *||* *619-709-6756*


Re: org.apache.solr.common.SolrException: Document is missing mandatory uniqueKey field: id

2015-10-29 Thread Shawn Heisey
On 10/29/2015 1:54 AM, fabigol wrote:
> hi,
> thank to your reply
> When you says 
> 'You must have a field labeled "id" in the doc sent to Solr'. it's in the
> response of the select that i must get an "id"?  i must write "select
> 'something' as ID" is it good???
> in schema.xml i have the following line
> ID
>  end
> my data-import file i have
> 
>
> I think that i must map the column ID in my file data-import. I make the
> mapping helping with the "select something as id..."
> is it good?

Solr's schema has the field, the problem is that the result from the
database query, after the dataimport handler is done with it, does not
contain the ID field for at least one document.  In your DIH config, I
saw this:



Either that field mapping isn't working, or the tiers_id db column is
NULL/missing on at least one of your database rows, which results in the
ID field being absent from the Solr document that the dataimport handler
sends to Solr.

Thanks,
Shawn



Re: org.apache.solr.common.SolrException: Document is missing mandatory uniqueKey field: id

2015-10-29 Thread Erick Erickson
When you have a  defined in schema.xml, that
is a field in the doc that must exist in every doc you send
to Solr.

Actually, you'll see in the definition for the  field that
it has required="true" set.

Assuming you have a column "carte_id" being selected for all
docs inserted into Solr that should probably work.

My guess is that at least some records do NOT have this value set.

Best,
Erick

On Thu, Oct 29, 2015 at 12:54 AM, fabigol  wrote:
> hi,
> thank to your reply
> When you says
> 'You must have a field labeled "id" in the doc sent to Solr'. it's in the
> response of the select that i must get an "id"?  i must write "select
> 'something' as ID" is it good???
> in schema.xml i have the following line
> ID
>  end
> my data-import file i have
> 
>
> I think that i must map the column ID in my file data-import. I make the
> mapping helping with the "select something as id..."
> is it good?
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/org-apache-solr-common-SolrException-Document-is-missing-mandatory-uniqueKey-field-id-tp4237067p4237147.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: SolrJ stalls/hangs on client.add(); and doesn't return

2015-10-29 Thread Erick Erickson
You're sending 100K docs in a single packet? It's vaguely possible that you're
getting a timeout although that doesn't square with no docs being indexed...

Hmmm, to check you could do a manual commit. Or watch the Solr log to
see if update
requests ever go there.

Or you're running out of memory on the client.

Or even exceeding the packet size that the servlet container will accept?

But I think at root you're misunderstanding
ConcurrentUpdateSolrClient. It doesn't
partition up a huge array and send them in parallel, it parallelized sending the
packet each call is given. So it's trying to send all 100K docs at
once. Probably not
what you were aiming for.

Try making batches of 1,000 docs and sending them through instead.

So the parameters are a bit of magic. You can have up to the number of threads
you specify sending their entire packet to solr in parallel, and up to queueSize
requests. Note this is the _request_, not the docs in the list if I'm
reading the code
correctly.

Best,
Erick

On Thu, Oct 29, 2015 at 1:52 AM, Markus Jelsma
 wrote:
> Hello - we have some processes periodically sending documents to 5.3.0 in 
> local mode using ConcurrentUpdateSolrClient 5.3.0, it has queueSize 10 and 
> threadCount 4, just chosen arbitrarily having no idea what is right.
>
> Usually its a few thousand up to some tens of thousands of rather small 
> documents. Now, when the number of documents is around or near a hundred 
> thousand, client.add(Iterator docIterator) stalls and 
> never returns. It also doesn't index any of the documents. Upon calling, it 
> quickly eats CPU and a load of heap but shortly after it goes idle, no CPU 
> and memory is released.
>
> I am puzzled, any ideas to share?
> Markus


Re: Solr for Pictures

2015-10-29 Thread Rallavagu
I was playing with exiftool (written in perl) and a custom java class 
built using metadata-extrator project 
(https://github.com/drewnoakes/metadata-extractor) and wondering if 
there is anything built into Solr or are there any best practices 
(general practices) to index pictures.


On 10/29/15 1:56 PM, Daniel Valdivia wrote:

Some extra googling yield this Wiki from a integration between Tika and a 
EXIFTool

https://wiki.apache.org/tika/EXIFToolParser 



On Oct 29, 2015, at 1:48 PM, Daniel Valdivia  wrote:

I think you can look into Tika for this https://tika.apache.org/ 


There’s handlers to integrate Tika and Solr, some context:

https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Solr+Cell+using+Apache+Tika
 





On Oct 29, 2015, at 1:47 PM, Rallavagu mailto:rallav...@gmail.com>> wrote:

In general, is there a built-in data handler to index pictures (essentially, 
EXIF and other data embedded in an image)? If not, what is the best practice to 
do so? Thanks.







Re: Solr for Pictures

2015-10-29 Thread Daniel Valdivia
Some extra googling yield this Wiki from a integration between Tika and a 
EXIFTool

https://wiki.apache.org/tika/EXIFToolParser 


> On Oct 29, 2015, at 1:48 PM, Daniel Valdivia  wrote:
> 
> I think you can look into Tika for this https://tika.apache.org/ 
> 
> 
> There’s handlers to integrate Tika and Solr, some context:
> 
> https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Solr+Cell+using+Apache+Tika
>  
> 
> 
> 
> 
>> On Oct 29, 2015, at 1:47 PM, Rallavagu > > wrote:
>> 
>> In general, is there a built-in data handler to index pictures (essentially, 
>> EXIF and other data embedded in an image)? If not, what is the best practice 
>> to do so? Thanks.
> 



RE: Solr for Pictures

2015-10-29 Thread Markus Jelsma


Hi - Solr does integrate with Apache Tika, which happily accepts images and 
other media formats. I am not sure if EXIF is exposed though but you might want 
to try. Otherwise patch it up or use Tika in your own process that indexes data 
to Solr.

https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Solr+Cell+using+Apache+Tika
 
-Original message-
> From:Rallavagu 
> Sent: Thursday 29th October 2015 21:47
> To: solr-user@lucene.apache.org
> Subject: Solr for Pictures
> 
> In general, is there a built-in data handler to index pictures 
> (essentially, EXIF and other data embedded in an image)? If not, what is 
> the best practice to do so? Thanks.
> 


Re: problem with solr auto add core after restart

2015-10-29 Thread Erick Erickson
What errors, if any, do you see in the Solr logs? The information here isn't
enough to say much.

Best,
Erick

On Thu, Oct 29, 2015 at 7:44 AM, sara hajili  wrote:
> hi,
> i add this in solr.xml file :
> ${coreRootDirectory:} 
> and in each core i added these to core.properties
> loadOnStartup=true
>
> and now if i stop and start  solr from solr_home/bin
> after restart solr start and automatically added cores to solr.
> but when i restart solr service in linux as :
> service solr restart
> solr start and doesn't add cores to solr automatically.
> how i can solve this problem to stop start solr service and automatically
> add  cores to it?
> plz help me.tnx


Re: Solr for Pictures

2015-10-29 Thread Daniel Valdivia
I think you can look into Tika for this https://tika.apache.org/ 


There’s handlers to integrate Tika and Solr, some context:

https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Solr+Cell+using+Apache+Tika
 




> On Oct 29, 2015, at 1:47 PM, Rallavagu  wrote:
> 
> In general, is there a built-in data handler to index pictures (essentially, 
> EXIF and other data embedded in an image)? If not, what is the best practice 
> to do so? Thanks.



Solr for Pictures

2015-10-29 Thread Rallavagu
In general, is there a built-in data handler to index pictures 
(essentially, EXIF and other data embedded in an image)? If not, what is 
the best practice to do so? Thanks.


Fetching data from the Geotopic Parser

2015-10-29 Thread Shruti Mundra
Hi,

We have started a Geotopic Parser on a specific port and when we tried to
get the data using a socket connection we are receiving this error message -

"Connected

Message send

HTTP/1.1 400 Bad Request

Connection: close

Server: Jetty(8.y.z-SNAPSHOT)


Error: 400"

What could be the reason for this?

Thanks and Regards,

Shruti Mundra


Problem with the Content Field during Solr Indexing

2015-10-29 Thread Shruti Mundra
Hi,

When I'm trying index an image file directly to Solr, the attribute
content, consists of trails of "\n"s and not the data.
We are successful in getting the metadata for that image.

Can anyone help us out on how we could get the content along with the
Metadata.

Thanks!

- Shruti Mundra


Re: restore quorum after majority of zk nodes down

2015-10-29 Thread Matteo Grolla
Hi Walter,
  it's not a problem to take down zk for a short (1h) time and
reconfigure it. Meanwhile solr would go in readonly mode.
I'd like feedback on the fastest way to do this. Would it work to just
reconfigure the cluster with other 2 empty zk nodes? Would they correctly
sync from the nonempty one? Should first copy data from zk3 to the two
empty zk?
Matteo


2015-10-29 18:34 GMT+01:00 Walter Underwood :

> You can't. Zookeeper needs a majority. One node is not a majority of a
> three node ensemble.
>
> There is no way to split a Solr Cloud cluster across two datacenters and
> have high availability. You can do that with three datacenters.
>
> You can probably bring up a new Zookeeper ensemble and configure the Solr
> cluster to talk to it.
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
>
> > On Oct 29, 2015, at 10:08 AM, Matteo Grolla 
> wrote:
> >
> > I'm designing a solr cloud installation where nodes from a single cluster
> > are distributed on 2 datacenters which are close and very well connected.
> > let's say that zk nodes zk1, zk2 are on DC1 and zk2 is on DC2 and let's
> say
> > that DC1 goes down and the cluster is left with zk3.
> > how can I restore a zk quorum from this situation?
> >
> > thanks
>
>


Re: restore quorum after majority of zk nodes down

2015-10-29 Thread Walter Underwood
You can't. Zookeeper needs a majority. One node is not a majority of a three 
node ensemble.

There is no way to split a Solr Cloud cluster across two datacenters and have 
high availability. You can do that with three datacenters.

You can probably bring up a new Zookeeper ensemble and configure the Solr 
cluster to talk to it.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On Oct 29, 2015, at 10:08 AM, Matteo Grolla  wrote:
> 
> I'm designing a solr cloud installation where nodes from a single cluster
> are distributed on 2 datacenters which are close and very well connected.
> let's say that zk nodes zk1, zk2 are on DC1 and zk2 is on DC2 and let's say
> that DC1 goes down and the cluster is left with zk3.
> how can I restore a zk quorum from this situation?
> 
> thanks



Index Metatags in Nutch site.xml

2015-10-29 Thread Salonee Rege
We have finished running bin/nutch solrindex command on our Nutch
segments.The data is getting indexed. I followed this link :
https://wiki.apache.org/nutch/IndexMetatags . The metatags description and
keywords were the sample ones we used. But they are not getting indexed.
What could be the problem with this

Thanks and Regards,
*Salonee Rege*
USC Viterbi School of Engineering
University of Southern California
Master of Computer Science - Student
Computer Science - B.E
salon...@usc.edu  *||* *619-709-6756*


restore quorum after majority of zk nodes down

2015-10-29 Thread Matteo Grolla
I'm designing a solr cloud installation where nodes from a single cluster
are distributed on 2 datacenters which are close and very well connected.
let's say that zk nodes zk1, zk2 are on DC1 and zk2 is on DC2 and let's say
that DC1 goes down and the cluster is left with zk3.
how can I restore a zk quorum from this situation?

thanks


Re: solr 5.3.0 master-slave: TWO segments after optimize

2015-10-29 Thread Andrii Berezhynskyi
Erick, they are not going away after reload.

Emir, increased cpu and response time are on slaves. On all slaves. Here is
a thread dump
https://gist.github.com/andriiberezhynskyi/739d59cf78b043d653da (though not
sure that I did it right, I used jstack -F PID).

Thanks for your help!


Looking for SOLR consulting help

2015-10-29 Thread William Bell
Healthgrades is looking for Solr consulting assistance.

Rate is negotiable based on skillsets. $125 - $175/hr

We are flexible on time.

1. Solr 5.x experience
2. Tuning performance
3. Relevancy and Autosuggest experience
4. Move to Solr Cloud
5. Amazon Linux Debian experience
6. Java experience using InteliJ and SOLRJ
7. Review requirements and implement changes to Solr cores/collections

Please email me directly, no agents.

Needed for 3-6 months. Could go full time.


-- 
Bill Bell
billnb...@gmail.com
cell 720-256-8076


Re: Stem Words Highlighted - Keyword Not Highlighted

2015-10-29 Thread Jack Krupansky
Did you index the data before adding the word delimiter filter? The white
space tokenizer preserves the period after "stocks.", but the WDF should
remove it. The period is likely interfering with stemming.

Are your filters the same for index time and query time?

-- Jack Krupansky

On Tue, Aug 18, 2015 at 3:31 PM, Ann B  wrote:

> Question:
>
> Can I configure solr to highlight the keyword also?  The search results are
> correct, but the highlighting is not complete.
>
> *
>
> Example:
>
> Keyword: stocks
>
> Request: (I only provided the url parameters below.)
>
> hl=true&
> hl.fl=spell&
> hl.simple.pre=%5BHIGHLIGHT%5D&
> hl.simple.post=%5B%2FHIGHLIGHT%5D&
> hl.snippets=3&
> hl.fragsize=70&
> hl.mergeContiguous=true&
>
> fl=item_id%2Cscore&
>
> qf=tm_body%3Avalue%5E1.0&
> qf=tm_title%5E13.0&
>
> fq=im_field_webresource_category%3A%226013%22&
> fq=index_id%3Atest&
>
>
> start=0&rows=10&facet=true&facet.sort=count&facet.limit=10&facet.mincount=1&facet.missing=false&facet.field=im_field_webresource_category&f.im_field_webresource_category.facet.limit=50&
>
> wt=json&json.nl=map&
>
> q=%22stocks%22
>
> *
>
> Response:
>
> "highlighting":{
> "test-49904":{"spell":[
> "Includes free access to [HIGHLIGHT]stock[/HIGHLIGHT] charts and
> instruction about using [HIGHLIGHT]stock[/HIGHLIGHT] charts in technical
> analysis of stocks. Paid subscriptions provide access to more
> information."]},...
>
> *
>
> Details:
>
> Tokenizer:  
>
> Filters:
>
>  ignoreCase="true" expand="true"/>  ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true"/>
>  generateWordParts="1" generateNumberParts="1" catenateWords="0"
> catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"
> preserveOriginal="1"/>  max="100"/>   class="solr.*SnowballPorterFilterFactory*" language="English"
> protected="protwords.txt"/>  class="solr.RemoveDuplicatesTokenFilterFactory"/>
>
> I think I'm using the Standard Highlighter.
>
> I’m using the Drupal 7 search api solr configuration files without
> modification.
>
>
> Thank you,
>
> Ann
>


Re: language plugin

2015-10-29 Thread Jack Krupansky
Are you trying to do an atomic update without the content field? If so, it
sounds like Solr needs an enhancement (bug fix?) so that language detection
would be skipped if the input field is not present. Or maybe that could be
an option.


-- Jack Krupansky

On Thu, Oct 29, 2015 at 3:25 AM, Chaushu, Shani 
wrote:

> Hi,
>  I'm using solr language detection plugin on field name "content" (solr
> 4.10, plugin LangDetectLanguageIdentifierUpdateProcessorFactory)
> When I'm indexing  on the first time it works fine, but if I want to set
> one field again (regardless if it's the content or not) if goes to its
> default language. If I'm setting other field I would like the language to
> stay the way it was before, and o don't want to insert all the content
> again. There is an option to set the plugin that it won't calculate again
> the language? (put langid.overwrite to false didn't work)
>
> Thanks,
> Shani
>
>
> -
> Intel Electronics Ltd.
>
> This e-mail and any attachments may contain confidential material for
> the sole use of the intended recipient(s). Any review or distribution
> by others is strictly prohibited. If you are not the intended
> recipient, please contact the sender and delete all copies.
>


Re: language plugin

2015-10-29 Thread Alexandre Rafalovitch
Could you post your full chain definition. It's an interesting
problem, but hard to answer without seeing exact current
configuration.

Regards,
   Alex.

Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
http://www.solr-start.com/

On 29 October 2015 at 03:25, Chaushu, Shani  wrote:
> Hi,
>  I'm using solr language detection plugin on field name "content" (solr 4.10, 
> plugin LangDetectLanguageIdentifierUpdateProcessorFactory)
> When I'm indexing  on the first time it works fine, but if I want to set one 
> field again (regardless if it's the content or not) if goes to its default 
> language. If I'm setting other field I would like the language to stay the 
> way it was before, and o don't want to insert all the content again. There is 
> an option to set the plugin that it won't calculate again the language? (put 
> langid.overwrite to false didn't work)
>
> Thanks,
> Shani


problem with solr auto add core after restart

2015-10-29 Thread sara hajili
hi,
i add this in solr.xml file :
${coreRootDirectory:} 
and in each core i added these to core.properties
loadOnStartup=true

and now if i stop and start  solr from solr_home/bin
after restart solr start and automatically added cores to solr.
but when i restart solr service in linux as :
service solr restart
solr start and doesn't add cores to solr automatically.
how i can solve this problem to stop start solr service and automatically
add  cores to it?
plz help me.tnx


Re: Closing Windows CMD kills Solr

2015-10-29 Thread Zheng Lin Edwin Yeo
Yes, this is the expected behaviour. Once you close the command window,
Solr will stop running. This has happened to me several times. Just to
check, which version of Solr are you using?

I have tried NSSM before, and it works for Solr 5.0 and Solr 5.1. However,
when I move up to Solr 5.3.0, I wasn't able to use the same method to start
it as a service using NSSM.

Do let me know if you managed to set up NSSM for Solr 5.3.0?

Regards,
Edwin


On 29 October 2015 at 17:22, Charlie Hull  wrote:

> On 29/10/2015 00:12, Steven White wrote:
>
>> Hi Folks,
>>
>> I don't understand if this is an expected behavior or not.
>>
>> On Windows, I start Solr from a command prompt like so:
>>
>>  bin\solr start -p 8983 -s C:\MySolrIndex
>>
>> Now, once I close the command prompt the Java process that started Solr is
>> killed.  is this expected?  How do I keep Solr alive when I close the
>> command prompt?
>>
>> This dose not happen on Linux.
>>
>> Thank in advanced.
>>
>> Steve
>>
>> Yes, this is expected behaviour. If you want to prevent it you either
> need to keep the command window open, or run Solr as a Windows Service
> (kinda like a Unix daemon, but more annoying). To do the latter I would
> recommend NSSM - here are some links that will help:
>
> http://mikerobbins.co.uk/2014/11/07/install-solr-as-windows-service-for-sitecore-content-search/
> (from step 2)
> http://www.norconex.com/how-to-run-solr5-as-a-service-on-windows/
>
> HTH
>
> Charlie
>
>
>
> --
> Charlie Hull
> Flax - Open Source Enterprise Search
>
> tel/fax: +44 (0)8700 118334
> mobile:  +44 (0)7767 825828
> web: www.flax.co.uk
>


Is it possible to use JiebaTokenizer for multilingual documents?

2015-10-29 Thread Zheng Lin Edwin Yeo
I would like to check, is it possible to use JiebaTokenizerFactory to index
Multilingual documents in Solr?

I found that JiebaTokenizerFactory works better for Chinese characters as
compared to HMMChineseTokenizerFactory.

However, for English characters, the JiebaTokenizerFactory is cutting the
words at the wrong place. For example, it will cut the word "water" as
follows:
*w|at|er*

It means that Solr will search for 3 separate words of "w", "at" and "er"
instead of the entire word "water".

Is there anyway to solve this problem, besides using a separate field for
English and Chinese characters?

Regards,
Edwin


Re: Closing Windows CMD kills Solr

2015-10-29 Thread Charlie Hull

On 29/10/2015 00:12, Steven White wrote:

Hi Folks,

I don't understand if this is an expected behavior or not.

On Windows, I start Solr from a command prompt like so:

 bin\solr start -p 8983 -s C:\MySolrIndex

Now, once I close the command prompt the Java process that started Solr is
killed.  is this expected?  How do I keep Solr alive when I close the
command prompt?

This dose not happen on Linux.

Thank in advanced.

Steve

Yes, this is expected behaviour. If you want to prevent it you either 
need to keep the command window open, or run Solr as a Windows Service 
(kinda like a Unix daemon, but more annoying). To do the latter I would 
recommend NSSM - here are some links that will help:
http://mikerobbins.co.uk/2014/11/07/install-solr-as-windows-service-for-sitecore-content-search/ 
(from step 2)

http://www.norconex.com/how-to-run-solr5-as-a-service-on-windows/

HTH

Charlie



--
Charlie Hull
Flax - Open Source Enterprise Search

tel/fax: +44 (0)8700 118334
mobile:  +44 (0)7767 825828
web: www.flax.co.uk


SolrJ stalls/hangs on client.add(); and doesn't return

2015-10-29 Thread Markus Jelsma
Hello - we have some processes periodically sending documents to 5.3.0 in local 
mode using ConcurrentUpdateSolrClient 5.3.0, it has queueSize 10 and 
threadCount 4, just chosen arbitrarily having no idea what is right.

Usually its a few thousand up to some tens of thousands of rather small 
documents. Now, when the number of documents is around or near a hundred 
thousand, client.add(Iterator docIterator) stalls and never 
returns. It also doesn't index any of the documents. Upon calling, it quickly 
eats CPU and a load of heap but shortly after it goes idle, no CPU and memory 
is released.

I am puzzled, any ideas to share? 
Markus


RE: Closing Windows CMD kills Solr

2015-10-29 Thread Routley, Alan
Hi Steve,

This is expected behaviour.

I get around this by creating a scheduled task, set to run at startup to start 
Solr.

Hope this helps

Alan

-Original Message-
From: Steven White [mailto:swhite4...@gmail.com]
Sent: 29 October 2015 00:13
To: solr-user@lucene.apache.org
Subject: Closing Windows CMD kills Solr

Hi Folks,

I don't understand if this is an expected behavior or not.

On Windows, I start Solr from a command prompt like so:

bin\solr start -p 8983 -s C:\MySolrIndex

Now, once I close the command prompt the Java process that started Solr is 
killed.  is this expected?  How do I keep Solr alive when I close the command 
prompt?

This dose not happen on Linux.

Thank in advanced.

Steve


**
Experience the British Library online at www.bl.uk
The British Library’s latest Annual Report and Accounts : 
www.bl.uk/aboutus/annrep/index.html
Help the British Library conserve the world's knowledge. Adopt a Book. 
www.bl.uk/adoptabook
The Library's St Pancras site is WiFi - enabled
*
The information contained in this e-mail is confidential and may be legally 
privileged. It is intended for the addressee(s) only. If you are not the 
intended recipient, please delete this e-mail and notify the 
postmas...@bl.uk : The contents of this e-mail must 
not be disclosed or copied without the sender's consent.
The statements and opinions expressed in this message are those of the author 
and do not necessarily reflect those of the British Library. The British 
Library does not take any responsibility for the views of the author.
*
Think before you print


Re: org.apache.solr.common.SolrException: Document is missing mandatory uniqueKey field: id

2015-10-29 Thread fabigol
hi,
thank to your reply
When you says 
'You must have a field labeled "id" in the doc sent to Solr'. it's in the
response of the select that i must get an "id"?  i must write "select
'something' as ID" is it good???
in schema.xml i have the following line
ID
 end
my data-import file i have


I think that i must map the column ID in my file data-import. I make the
mapping helping with the "select something as id..."
is it good?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/org-apache-solr-common-SolrException-Document-is-missing-mandatory-uniqueKey-field-id-tp4237067p4237147.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Many mapping files

2015-10-29 Thread Gora Mohanty
On 28 October 2015 at 19:45, fabigol  wrote:
>
> Thank for your response.
> I have 7 files *.xml. I already worked with SOlr but i have an only file. My
> question is why in this project there's 7 files describing an entity.

I am afraid that it is still difficult for an external person to guess
at how these files are being used without looking at complete details.
These look like separate data-config.xml files for DIH, and my best
guess is that either these were used for trial-and-error while setting
up the Solr installation, or that the person responsible manually
edited Solr's configuration files to use one or the other
data-config.xml (not sure why one would do that).

You might be better off trying to understand things from the other
end, i.e., figuring out what searches are made in the system, and what
data from the RDBMS needs to be imported into Solr for these. That
should lead you to an understanding of how these DIH data
configuration files might be used.

Regards,
Gora


language plugin

2015-10-29 Thread Chaushu, Shani
Hi,
 I'm using solr language detection plugin on field name "content" (solr 4.10, 
plugin LangDetectLanguageIdentifierUpdateProcessorFactory)
When I'm indexing  on the first time it works fine, but if I want to set one 
field again (regardless if it's the content or not) if goes to its default 
language. If I'm setting other field I would like the language to stay the way 
it was before, and o don't want to insert all the content again. There is an 
option to set the plugin that it won't calculate again the language? (put 
langid.overwrite to false didn't work)

Thanks,
Shani


-
Intel Electronics Ltd.

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.