Re: How to routing document for send to particular shard range

2017-11-07 Thread Amrit Sarkar
Ketan,

If you know defined indexing architecture; isn't it better to use
"implicit" router by writing logic on your own end.

If the document is of "Org1", send the document with extra param*
"_route_:shard1"* and likewise.

Snippet from official doc:
https://lucene.apache.org/solr/guide/6_6/shards-and-indexing-data-in-solrcloud.html#ShardsandIndexingDatainSolrCloud-DocumentRouting
:

If you created the collection and defined the "implicit" router at the time
> of creation, you can additionally define a router.field parameter to use a
> field from each document to identify a shard where the document belongs. If
> the field specified is missing in the document, however, the document will
> be rejected. You could also use the _route_ parameter to name a specific
> shard.



Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2
Medium: https://medium.com/@sarkaramrit2

On Wed, Nov 8, 2017 at 11:15 AM, Ketan Thanki  wrote:

> Hi,
>
> I have requirement now quite different as I need to set routing key hash
> for document which confirm it to send to particular shard as its range.
>
> I have solrcloud configuration with 4 shard  & 4 replica with below shard
> range.
> shard1: 8000-bfff
> shard2: c000-
> shard3: 0-3fff
> shard4: 4000-7fff
>
> e.g: below show the project  works in organization which is my routing key.
> Org1= works for project1,project2
> Org2=works for project3
> Org3=works for project4
> Org4=project5
>
> So as mentions above I want to index org1 to shard1,org2 to shard2,org3 to
> shard3,org4 to shard4 meanwhile send it to particular shard.
> How could I manage compositeId routing to do this.
>
> Regards,
> Ketan.
> Please cast a vote for Asite in the 2017 Construction Computing Awards:
> Click here to Vote cca2017vote>
>
> [CC Award Winners!]
>
>


Re: Issues with Graphite reporter config

2017-11-07 Thread sudershan madhavan
Can someone confirm if this needs to be reported as a bug? As the exception
in the patch seems to be different than that of the the SOLR-11413? Also
the issue is not sporadic but occurs every time the Graphite Reporter is
invoked for multiple metrics.

Regards
Sudershan Madahavan

On Tue, Nov 7, 2017 at 7:10 PM, sudershan madhavan <
sudershan.madha...@gmail.com> wrote:

> Thank you Cassandra. Does seem like a thread unsafe operation issue. But
> what confuses me is the error occurs every time and only occurs when I have
> multiple metrics group configured. Also the exception is null pointer on
> the linked list instead of already connected exception
>
> Regards
> Sudershan Madhavan
>
> On 7 Nov 2017 6:18 pm, "Cassandra Targett"  wrote:
>
>> I believe this is https://issues.apache.org/jira/browse/SOLR-11413,
>> which has a fix already slated for Solr 7.2.
>>
>> On Tue, Nov 7, 2017 at 10:44 AM, sudershan madhavan
>>  wrote:
>> > Hi,
>> > I am running Solrcloud version: 6.6.1
>> > I have been trying to use graphite to report solr metrics and seem to
>> get
>> > the below error while doing so in the solr logs:
>> >>
>> >> java.lang.NullPointerException
>> >> at
>> >> com.codahale.metrics.graphite.PickledGraphite.pickleMetrics(
>> PickledGraphite.java:313)
>> >> at
>> >> com.codahale.metrics.graphite.PickledGraphite.writeMetrics(P
>> ickledGraphite.java:255)
>> >> at
>> >> com.codahale.metrics.graphite.PickledGraphite.send(PickledGr
>> aphite.java:213)
>> >> at
>> >> com.codahale.metrics.graphite.GraphiteReporter.reportGauge(G
>> raphiteReporter.java:345)
>> >> at
>> >> com.codahale.metrics.graphite.GraphiteReporter.report(Graphi
>> teReporter.java:243)
>> >> at
>> >> com.codahale.metrics.ScheduledReporter.report(ScheduledRepor
>> ter.java:251)
>> >> at
>> >> com.codahale.metrics.ScheduledReporter$1.run(ScheduledReport
>> er.java:174)
>> >> at
>> >> java.util.concurrent.Executors$RunnableAdapter.call(
>> Executors.java:511)
>> >> at
>> >> java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
>> >> at
>> >> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFu
>> tureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>> >> at
>> >> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFu
>> tureTask.run(ScheduledThreadPoolExecutor.java:294)
>> >> at
>> >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool
>> Executor.java:1142)
>> >> at
>> >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo
>> lExecutor.java:617)
>> >> at java.lang.Thread.run(Thread.java:745)
>> >> 2017-11-07 15:28:47.543 WARN  (metrics-graphite-reporter-3-thread-1)
>> [   ]
>> >> c.c.m.g.GraphiteReporter Unable to report to Graphite
>> >> java.net.SocketException: Socket closed
>> >> at
>> >> java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:118)
>> >> at java.net.SocketOutputStream.wr
>> ite(SocketOutputStream.java:143)
>> >> at
>> >> com.codahale.metrics.graphite.PickledGraphite.writeMetrics(P
>> ickledGraphite.java:261)
>> >> at
>> >> com.codahale.metrics.graphite.PickledGraphite.send(PickledGr
>> aphite.java:213)
>> >> at
>> >> com.codahale.metrics.graphite.GraphiteReporter.sendIfEnabled
>> (GraphiteReporter.java:328)
>> >> at
>> >> com.codahale.metrics.graphite.GraphiteReporter.reportTimer(G
>> raphiteReporter.java:288)
>> >> at
>> >> com.codahale.metrics.graphite.GraphiteReporter.report(Graphi
>> teReporter.java:259)
>> >> at
>> >> com.codahale.metrics.ScheduledReporter.report(ScheduledRepor
>> ter.java:251)
>> >> at
>> >> com.codahale.metrics.ScheduledReporter$1.run(ScheduledReport
>> er.java:174)
>> >> at
>> >> java.util.concurrent.Executors$RunnableAdapter.call(
>> Executors.java:511)
>> >> at
>> >> java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
>> >> at
>> >> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFu
>> tureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>> >> at
>> >> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFu
>> tureTask.run(ScheduledThreadPoolExecutor.java:294)
>> >> at
>> >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool
>> Executor.java:1142)
>> >> at
>> >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo
>> lExecutor.java:617)
>> >> at java.lang.Thread.run(Thread.java:745)
>> >> 2017-11-07 15:28:47.543 ERROR (metrics-graphite-reporter-1-thread-1)
>> [   ]
>> >> c.c.m.ScheduledReporter Exception thrown from GraphiteReporter#report.
>> >> Exception was suppressed.
>> >> java.lang.NullPointerException
>> >> at java.util.LinkedList$ListItr.next(LinkedList.java:893)
>> >> at
>> >> com.codahale.metrics.graphite.PickledGraphite.pickleMetrics(
>> PickledGraphite.java:305)
>> >> at
>> >> com.codahale.metrics.graphite.PickledGraphite.

Re: Developing custom tokenizer/filter in solr 5.4.1

2017-11-07 Thread kumar gaurav
Hi Erick

I am very happy to see your reply .

It was mistakenly written 5.4.1 in last mail . I am developing plugin in
solr-5.2.1 .

i am compiling jars and executing for the same version i.e. 5.2.1 , yet i
am getting following error

Caused by: org.apache.solr.common.SolrException: Plugin init failure
for [schema.xml] analyzer/filter: class
com.skyrim.ReverseFilterFactory
at 
org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:178)
at 
org.apache.solr.schema.FieldTypePluginLoader.readAnalyzer(FieldTypePluginLoader.java:401)
at 
org.apache.solr.schema.FieldTypePluginLoader.create(FieldTypePluginLoader.java:104)
at 
org.apache.solr.schema.FieldTypePluginLoader.create(FieldTypePluginLoader.java:52)
at 
org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:152)
... 16 more
Caused by: java.lang.ClassCastException: class com.skyrim.ReverseFilterFactory
at java.lang.Class.asSubclass(Class.java:3404)
at 
org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:475)
at 
org.apache.solr.core.SolrResourceLoader.newInstance(SolrResourceLoader.java:560)
at 
org.apache.solr.schema.FieldTypePluginLoader$3.create(FieldTypePluginLoader.java:383)
at 
org.apache.solr.schema.FieldTypePluginLoader$3.create(FieldTypePluginLoader.java:377)
at 
org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:152)



Please help me its very urgent to build a custom tokenizer like
StandardTokenizerFactory where i will write my own rules for indexing.






On Wed, Nov 8, 2017 at 4:30 AM, Erick Erickson 
wrote:

> Looks to me like you're compiling against the jars from one version of
> Solr and executing against another.
>
> /root/solr-5.2.1/server/solr/#/conf/managed-schema
>
> yet you claim to be using 5.4.1
>
> On Tue, Nov 7, 2017 at 12:00 PM, kumar gaurav  wrote:
> > Hi
> >
> > I am developing my own custom filter in solr 5.4.1.
> >
> > I have created a jar of a filter class with extend to TokenizerFactory
> > class .
> >
> > When i loaded in to sol config and add my filter to managed-schema , i
> > found following error -
> >
> > org.apache.solr.common.SolrException: Could not load conf for core
> > : Plugin init failure for [schema.xml] fieldType "text_reversed":
> > Plugin init failure for [schema.xml] analyzer/filter: class
> > com.skyrim.ReverseFilterFactory. Schema file is
> > /root/solr-5.2.1/server/solr/#/conf/managed-schema
> >
> >
> > Caused by: java.lang.ClassCastException: class com.skyrim.
> ReverseFilterFactory
> >
> >
> > Why java.lang.ClassCastException is occurring while loading a plugin ?
> >
> >
> > Please help someone . very much thanks in advance .
> >
> >
> >
> >
> > regards
> >
> > Kumar Gaurav
> >
> > Software Engineer
>


Re: Streaming Expression usage

2017-11-07 Thread Amrit Sarkar
Kojo,

Not sure what do you mean by making two request to get documents. A
"search" streaming expression can be passed with "fq" parameter to filter
the results and rollup on top of that will fetch you desired results. This
maybe not mentioned in official docs:

Sample streaming expression:

expr=rollup(
>
> search(collection1,
>
> zkHost="localhost:9983",
>
> qt="/export",
>
> q="*:*",
>
> fq=a_s:filter_a
>
> fl="id,a_s,a_i,a_f",
>
> sort="a_f asc"),
>
>over=a_f)
>
>
Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2
Medium: https://medium.com/@sarkaramrit2

On Wed, Nov 8, 2017 at 7:41 AM, Kojo  wrote:

> Hi,
> I am working on PoC of a front-end web to provide an interface to the end
> user search and filter data on Solr indexes.
>
> I am trying Streaming Expression for about a week and I am fairly keen
> about using it to search and filter indexes on Solr side. But I am not sure
> whether this is the right approach or not.
>
> A simple question to illustrate my doubts: If use the search and some
> Streaming Expressions more to get and filter the indexes to get documents,
> and I want to rollup the result, will I have to make two requests? Is this
> a good use for Streaming Expressions?
>


Re: Can someone help? Two level nested doc... ChildDocTransformerFactory sytax...

2017-11-07 Thread Mikhail Khludnev
you can chain two [subquery] transformer, but really it's better to receive
them flat and sort child and grands across levels in post processing.

On Tue, Nov 7, 2017 at 4:05 AM, Petersen, Robert (Contr) <
robert.peters...@ftr.com> wrote:

> OK no faceting, no filtering, I just want the hierarchy to come backin the
> results. Can't quite get it... googled all over the place too.
>
>
> Doc:
>
> { id : asdf, type_s:customer, firstName_s:Manny, lastName_s:Acevedo,
> address_s:"123 Fourth Street", city_s:Gotham, tn_s:1234561234,
>   _childDocuments_:[
>   { id : adsf_c1,
> src_s : "CRM.Customer",
> type_s:customerSource,
> _childDocuments_:[
> {
> id : asdf_c1_c1,
> type_s:customerSourceType,
> "key_s": "id",
> "value_s": "GUID"
> }
> ]
> },
>   { id : adsf_c2,
> "src_s": "DPI.SalesOrder",
> type_s:customerSource,
> _childDocuments_:[
> {
> id : asdf_c2_c1,
> type_s:customerSourceType,
> "key_s": "btn",
> "value_s": "4052328908"
> },
> {
> id : asdf_c2_c2,
> type_s:customerSourceType,
> "key_s": "seq",
> "value_s": "5"
>},
> {
> id : asdf_c2_c3,
> type_s:customerSourceType,
> "key_s": "env",
> "value_s": "MS"
> }
> ]
> }
> ]
> }
>
>
> Queries:
>
> Gives all nested docs regardless of level as a flat set
> http://localhost:8983/solr/temptest/select?q=id:asdf&fl=
> id,[child%20parentFilter=type_s:customer]
>
> Gives all nested child docs only
> http://localhost:8983/solr/temptest/select?q=id:asdf&fl=
> id,[child%20parentFilter=type_s:customer%20childFilter=type_
> s:customerSource]
>
> How to get nested grandchild docs at correct level?
> Nope exception:
> http://localhost:8983/solr/temptest/select?q=id:asdf&fl=
> id,[child%20parentFilter=type_s:customer%20childFilter=type_
> s:customerSource],[child%20parentFilter=type_s:
> customerSource%20childFilter=type_s:customerSourceType]
>
> Nope exception:
> http://localhost:8983/solr/temptest/select?q=id:asdf&fl=
> id,[child%20parentFilter=type_s:customer%20childFilter=type_
> s:customerSource],[child%20parentFilter=type_s:customerSource]
>
>
> Nope but no exception only gets children again tho like above:
> http://localhost:8983/solr/temptest/select?q=id:asdf&fl=
> id,[child%20parentFilter=type_s:customer%20childFilter=type_
> s:customerSource],[child%20parentFilter=type_s:customer*]
>
> Nope but no exception only gets children again: solr/temptest/select?q=id:asdf&fl=id,[child%20parentFilter=type_s:
> customer%20childFilter=type_s:customerSource],[child%
> 20parentFilter=type_s:customer*%20childFilter=type_s:customerSourceType]>
>
> http://localhost:8983/solr/temptest/select?q=id:asdf&fl=
> id,[child%20parentFilter=type_s:customer%20childFilter=type_
> s:customerSource],[child%20parentFilter=type_s:
> customer*%20childFilter=type_s:customerSourceType]
>
>
> Nope same again... no grandchildren:
>
> http://localhost:8983/solr/temptest/select?q=id:asdf&fl=
> id,p:[child%20parentFilter=type_s:customer%20childFilter=
> type_s:customerSource],q:[child%20parentFilter=-type_s:
> customer%20parentFilter=type_s:customerSource%20childFilter=type_s:
> customerSourceType]
>
>
> Gives all but flat no child to grandchild hierarchy:
>
> http://localhost:8983/solr/temptest/select?q=id:asdf&fl=
> id,p:[child%20parentFilter=type_s:customer%20childFilter=
> type_s:customerSource],q:[child%20parentFilter=type_s:
> customer%20childFilter=type_s:customerSourceType]
>
>
> Thanks in advance,
>
> Robi
>
> 
>
> This communication is confidential. Frontier only sends and receives email
> on the basis of the terms set out at http://www.frontier.com/email_
> disclaimer.
>



-- 
Sincerely yours
Mikhail Khludnev


How to routing document for send to particular shard range

2017-11-07 Thread Ketan Thanki
Hi,

I have requirement now quite different as I need to set routing key hash for 
document which confirm it to send to particular shard as its range.

I have solrcloud configuration with 4 shard  & 4 replica with below shard range.
shard1: 8000-bfff
shard2: c000-
shard3: 0-3fff
shard4: 4000-7fff

e.g: below show the project  works in organization which is my routing key.
Org1= works for project1,project2
Org2=works for project3
Org3=works for project4
Org4=project5

So as mentions above I want to index org1 to shard1,org2 to shard2,org3 to 
shard3,org4 to shard4 meanwhile send it to particular shard.
How could I manage compositeId routing to do this.

Regards,
Ketan.
Please cast a vote for Asite in the 2017 Construction Computing Awards: Click 
here to Vote

[CC Award Winners!]



Streaming Expression usage

2017-11-07 Thread Kojo
Hi,
I am working on PoC of a front-end web to provide an interface to the end
user search and filter data on Solr indexes.

I am trying Streaming Expression for about a week and I am fairly keen
about using it to search and filter indexes on Solr side. But I am not sure
whether this is the right approach or not.

A simple question to illustrate my doubts: If use the search and some
Streaming Expressions more to get and filter the indexes to get documents,
and I want to rollup the result, will I have to make two requests? Is this
a good use for Streaming Expressions?


Re: Developing custom tokenizer/filter in solr 5.4.1

2017-11-07 Thread Erick Erickson
Looks to me like you're compiling against the jars from one version of
Solr and executing against another.

/root/solr-5.2.1/server/solr/#/conf/managed-schema

yet you claim to be using 5.4.1

On Tue, Nov 7, 2017 at 12:00 PM, kumar gaurav  wrote:
> Hi
>
> I am developing my own custom filter in solr 5.4.1.
>
> I have created a jar of a filter class with extend to TokenizerFactory
> class .
>
> When i loaded in to sol config and add my filter to managed-schema , i
> found following error -
>
> org.apache.solr.common.SolrException: Could not load conf for core
> : Plugin init failure for [schema.xml] fieldType "text_reversed":
> Plugin init failure for [schema.xml] analyzer/filter: class
> com.skyrim.ReverseFilterFactory. Schema file is
> /root/solr-5.2.1/server/solr/#/conf/managed-schema
>
>
> Caused by: java.lang.ClassCastException: class com.skyrim.ReverseFilterFactory
>
>
> Why java.lang.ClassCastException is occurring while loading a plugin ?
>
>
> Please help someone . very much thanks in advance .
>
>
>
>
> regards
>
> Kumar Gaurav
>
> Software Engineer


Re: Solr - phrase suggestion returning duplicate

2017-11-07 Thread ruby
Yes, Id is an unique field in my schema.

I found following Jira issue:
https://issues.apache.org/jira/browse/LUCENE-6336

It looks related to me. It does not mention that it was fixed. Is it fixed
in Solr 6.1? I'm using Solr 6.1



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Solr - phrase suggestion returning duplicate

2017-11-07 Thread ruby
yes, id is an unique field.

I found following issue in Jira:
https://issues.apache.org/jira/browse/LUCENE-6336

It says affected versions are 4.10.3, 5.0. I'm using Solr 6.1 and seeing
this issue.

You can recreate it by indexing those documents I shared and querying.



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Developing custom tokenizer/filter in solr 5.4.1

2017-11-07 Thread kumar gaurav
Hi

I am developing my own custom filter in solr 5.4.1.

I have created a jar of a filter class with extend to TokenizerFactory
class .

When i loaded in to sol config and add my filter to managed-schema , i
found following error -

org.apache.solr.common.SolrException: Could not load conf for core
: Plugin init failure for [schema.xml] fieldType "text_reversed":
Plugin init failure for [schema.xml] analyzer/filter: class
com.skyrim.ReverseFilterFactory. Schema file is
/root/solr-5.2.1/server/solr/#/conf/managed-schema


Caused by: java.lang.ClassCastException: class com.skyrim.ReverseFilterFactory


Why java.lang.ClassCastException is occurring while loading a plugin ?


Please help someone . very much thanks in advance .




regards

Kumar Gaurav

Software Engineer


Re: Long blocking during indexing + deleteByQuery

2017-11-07 Thread Chris Troullis
@Erick, I see, thanks for the clarification.

@Shawn, Good idea for the workaround! I will try that and see if it
resolves the issue.

Thanks,

Chris

On Tue, Nov 7, 2017 at 1:09 PM, Erick Erickson 
wrote:

> bq: you think it is caused by the DBQ deleting a document while a
> document with that same ID
>
> No. I'm saying that DBQ has no idea _if_ that would be the case so
> can't carry out the operations in parallel because it _might_ be the
> case.
>
> Shawn:
>
> IIUC, here's the problem. For deleteById, I can guarantee the
> sequencing through the same optimistic locking that regular updates
> use (i.e. the _version_ field). But I'm kind of guessing here.
>
> Best,
> Erick
>
> On Tue, Nov 7, 2017 at 8:51 AM, Shawn Heisey  wrote:
> > On 11/5/2017 12:20 PM, Chris Troullis wrote:
> >> The issue I am seeing is when some
> >> threads are adding/updating documents while other threads are issuing
> >> deletes (using deleteByQuery), solr seems to get into a state of extreme
> >> blocking on the replica
> >
> > The deleteByQuery operation cannot coexist very well with other indexing
> > operations.  Let me tell you about something I discovered.  I think your
> > problem is very similar.
> >
> > Solr 4.0 and later is supposed to be able to handle indexing operations
> > at the same time that the index is being optimized (in Lucene,
> > forceMerge).  I have some indexes that take about two hours to optimize,
> > so having indexing stop while that happens is a less than ideal
> > situation.  Ongoing indexing is similar in many ways to a merge, enough
> > that it is handled by the same Merge Scheduler that handles an optimize.
> >
> > I could indeed add documents to the index without issues at the same
> > time as an optimize, but when I would try my full indexing cycle while
> > an optimize was underway, I found that all operations stopped until the
> > optimize finished.
> >
> > Ultimately what was determined (I think it was Yonik that figured it
> > out) was that *most* indexing operations can happen during the optimize,
> > *except* for deleteByQuery.  The deleteById operation works just fine.
> >
> > I do not understand the low-level reasons for this, but apparently it's
> > not something that can be easily fixed.
> >
> > A workaround is to send the query you plan to use with deleteByQuery as
> > a standard query with a limited fl parameter, to retrieve matching
> > uniqueKey values from the index, then do a deleteById with that list of
> > ID values instead.
> >
> > Thanks,
> > Shawn
> >
>


Re: Solr - phrase suggestion returning duplicate

2017-11-07 Thread Erick Erickson
Is "id" the actual  in your schema? If you indexed the same
document twice, the second one should overwrite the first one so
getting two docs back with the same ID is strange.

Best,
Erick

On Tue, Nov 7, 2017 at 10:43 AM, ruby  wrote:
> I'm trying to enable phrase suggestion in my application by using
> *AnalyzingInfixLookupFactory *and *DocumentDictionaryFactory*. Following is
> what my configuration looks like:
>
> 
>   
> mySuggester
>   AnalyzingInfixLookupFactory
>   suggester_infix_dir
> DocumentDictionaryFactory
> title
> suggestType
> false
> false
>   
> 
>  startup="lazy" >
>   
> true
> 10
> mySuggester
>   
>   
> suggest
>   
> 
>
> I have following documents indexed:
>
> 
> 44
> 
> 
>
> 
> 11
> Video gaming: the history
> 
> 
>
> 55
> Video games: multiplayer gaming
> 
>
> 
> 33
> Video gaming: the history
> 
>
> After indexing documents and building the suggester, when I query I get
> duplicate suggestions
>
> q.suggest=video
> returns
> [
>   {
> "id":"44",
> "*title":"Video gaming: the history"},*
>   {
> "id":"33",
> "title":"Video games: multiplayer gaming"},
>   {
> "id":"44",
> *"title":"Video gaming: the history"}]*
>
> Is this a known bug with Solr suggester? shouldn't suggester by default
> return unique suggestions?
>
>
> Thanks
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Trouble Installing Solr 7.1.0 On Ubunti 17

2017-11-07 Thread Shawn Heisey
On 11/7/2017 11:51 AM, Dane Terrell wrote:
> I'm afraid that method doesn't work either. I am still perplexed as to how to 
> install Solr 7 on Ubuntu 17 on my local enviornment.

How about we start over.  The previous info shows that you have the Solr
download in /tmp.  I will assume that the file is still there.  If it's
not, then you will need to put it there again, or adjust the commands
below to the new location.

I'm guessing that you're trying to run the installer with default
options.  If that's the case, run these commands:

#--
mkdir ~/newdir
cd ~/newdir
tar xzf /tmp/solr-7.1.0.tgz \
 solr-7.1.0/bin/install_solr_service.sh \
 --strip-components=2
sudo bash install_solr_service.sh /tmp/solr-7.1.0.tgz
#--

There are four commands there.  I have split the "tar" command into
three lines that should be ready to paste directly into a shell prompt. 
I'm hoping that it will be properly formatted when it gets to you.

If you need additional options for the install script, go ahead and add
them to the end of the last command.

When you're done with those commands, you can delete that "newdir" that
the commands created.

Thanks,
Shawn



Re: Trouble Installing Solr 7.1.0 On Ubunti 17

2017-11-07 Thread Dane Terrell
I'm afraid that method doesn't work either. I am still perplexed as to how to 
install Solr 7 on Ubuntu 17 on my local enviornment. Dane Michael Terrell 

On Tuesday, October 24, 2017 9:44 AM, Shawn Heisey  
wrote:
 

 On 10/23/2017 9:11 PM, Dane Terrell wrote:
> Hi I'm new to apache solr. I'm looking to install apache solr 7.1.0 on my 
> localhost computer. I downloaded and extracted the tar file in my tmp folder. 
> But when I try to run the script... sudo: 
> solr-7.1.0/solr/bin/install_solr_service.sh: command not found
> or
> solr-7.1.0/solr/bin/install_solr_service.sh --strip-components=2
> I get the same error message. Can anyone help?

It looks like install_solr_service.sh is not executable.

I created a file named 'fff' in my current directory, with this content:

#!/bin/sh
echo yay

Then I proceeded to try to run it with sudo.  It gave the same message
you got.  Then I made it executable, tried it again, and it worked:

root@smeagol:~# sudo ./fff
sudo: ./fff: command not found
root@smeagol:~# chmod +x fff
root@smeagol:~# sudo ./fff
yay

You have two choices to fix this problem.  You can make the script
executable, or you can add "bash" right after sudo and before the script
path.

Thanks,
Shawn



   

Solr - phrase suggestion returning duplicate

2017-11-07 Thread ruby
I'm trying to enable phrase suggestion in my application by using
*AnalyzingInfixLookupFactory *and *DocumentDictionaryFactory*. Following is
what my configuration looks like:


  
mySuggester
  AnalyzingInfixLookupFactory
  suggester_infix_dir
DocumentDictionaryFactory
title
suggestType
false
false
  


  
true
10
mySuggester
  
  
suggest
  


I have following documents indexed:


44




11
Video gaming: the history



55
Video games: multiplayer gaming



33
Video gaming: the history


After indexing documents and building the suggester, when I query I get
duplicate suggestions

q.suggest=video
returns
[
  {
"id":"44",
"*title":"Video gaming: the history"},*
  {
"id":"33",
"title":"Video games: multiplayer gaming"},
  {
"id":"44",
*"title":"Video gaming: the history"}]*

Is this a known bug with Solr suggester? shouldn't suggester by default
return unique suggestions?


Thanks



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Issues with Graphite reporter config

2017-11-07 Thread sudershan madhavan
Thank you Cassandra. Does seem like a thread unsafe operation issue. But
what confuses me is the error occurs every time and only occurs when I have
multiple metrics group configured. Also the exception is null pointer on
the linked list instead of already connected exception

Regards
Sudershan Madhavan

On 7 Nov 2017 6:18 pm, "Cassandra Targett"  wrote:

> I believe this is https://issues.apache.org/jira/browse/SOLR-11413,
> which has a fix already slated for Solr 7.2.
>
> On Tue, Nov 7, 2017 at 10:44 AM, sudershan madhavan
>  wrote:
> > Hi,
> > I am running Solrcloud version: 6.6.1
> > I have been trying to use graphite to report solr metrics and seem to get
> > the below error while doing so in the solr logs:
> >>
> >> java.lang.NullPointerException
> >> at
> >> com.codahale.metrics.graphite.PickledGraphite.pickleMetrics(
> PickledGraphite.java:313)
> >> at
> >> com.codahale.metrics.graphite.PickledGraphite.writeMetrics(
> PickledGraphite.java:255)
> >> at
> >> com.codahale.metrics.graphite.PickledGraphite.send(
> PickledGraphite.java:213)
> >> at
> >> com.codahale.metrics.graphite.GraphiteReporter.reportGauge(
> GraphiteReporter.java:345)
> >> at
> >> com.codahale.metrics.graphite.GraphiteReporter.report(
> GraphiteReporter.java:243)
> >> at
> >> com.codahale.metrics.ScheduledReporter.report(
> ScheduledReporter.java:251)
> >> at
> >> com.codahale.metrics.ScheduledReporter$1.run(
> ScheduledReporter.java:174)
> >> at
> >> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> >> at
> >> java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
> >> at
> >> java.util.concurrent.ScheduledThreadPoolExecutor$
> ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
> >> at
> >> java.util.concurrent.ScheduledThreadPoolExecutor$
> ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
> >> at
> >> java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1142)
> >> at
> >> java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:617)
> >> at java.lang.Thread.run(Thread.java:745)
> >> 2017-11-07 15:28:47.543 WARN  (metrics-graphite-reporter-3-thread-1)
> [   ]
> >> c.c.m.g.GraphiteReporter Unable to report to Graphite
> >> java.net.SocketException: Socket closed
> >> at
> >> java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:118)
> >> at java.net.SocketOutputStream.write(SocketOutputStream.java:
> 143)
> >> at
> >> com.codahale.metrics.graphite.PickledGraphite.writeMetrics(
> PickledGraphite.java:261)
> >> at
> >> com.codahale.metrics.graphite.PickledGraphite.send(
> PickledGraphite.java:213)
> >> at
> >> com.codahale.metrics.graphite.GraphiteReporter.sendIfEnabled(
> GraphiteReporter.java:328)
> >> at
> >> com.codahale.metrics.graphite.GraphiteReporter.reportTimer(
> GraphiteReporter.java:288)
> >> at
> >> com.codahale.metrics.graphite.GraphiteReporter.report(
> GraphiteReporter.java:259)
> >> at
> >> com.codahale.metrics.ScheduledReporter.report(
> ScheduledReporter.java:251)
> >> at
> >> com.codahale.metrics.ScheduledReporter$1.run(
> ScheduledReporter.java:174)
> >> at
> >> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> >> at
> >> java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
> >> at
> >> java.util.concurrent.ScheduledThreadPoolExecutor$
> ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
> >> at
> >> java.util.concurrent.ScheduledThreadPoolExecutor$
> ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
> >> at
> >> java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1142)
> >> at
> >> java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:617)
> >> at java.lang.Thread.run(Thread.java:745)
> >> 2017-11-07 15:28:47.543 ERROR (metrics-graphite-reporter-1-thread-1)
> [   ]
> >> c.c.m.ScheduledReporter Exception thrown from GraphiteReporter#report.
> >> Exception was suppressed.
> >> java.lang.NullPointerException
> >> at java.util.LinkedList$ListItr.next(LinkedList.java:893)
> >> at
> >> com.codahale.metrics.graphite.PickledGraphite.pickleMetrics(
> PickledGraphite.java:305)
> >> at
> >> com.codahale.metrics.graphite.PickledGraphite.writeMetrics(
> PickledGraphite.java:255)
> >> at
> >> com.codahale.metrics.graphite.PickledGraphite.send(
> PickledGraphite.java:213)
> >> at
> >> com.codahale.metrics.graphite.GraphiteReporter.sendIfEnabled(
> GraphiteReporter.java:328)
> >> at
> >> com.codahale.metrics.graphite.GraphiteReporter.reportMetered(
> GraphiteReporter.java:304)
> >> at
> >> com.codahale.metrics.graphite.GraphiteReporter.report(
> GraphiteReporter.java:255)
> >> at
> >> com.codahale.m

Re: Long blocking during indexing + deleteByQuery

2017-11-07 Thread Erick Erickson
bq: you think it is caused by the DBQ deleting a document while a
document with that same ID

No. I'm saying that DBQ has no idea _if_ that would be the case so
can't carry out the operations in parallel because it _might_ be the
case.

Shawn:

IIUC, here's the problem. For deleteById, I can guarantee the
sequencing through the same optimistic locking that regular updates
use (i.e. the _version_ field). But I'm kind of guessing here.

Best,
Erick

On Tue, Nov 7, 2017 at 8:51 AM, Shawn Heisey  wrote:
> On 11/5/2017 12:20 PM, Chris Troullis wrote:
>> The issue I am seeing is when some
>> threads are adding/updating documents while other threads are issuing
>> deletes (using deleteByQuery), solr seems to get into a state of extreme
>> blocking on the replica
>
> The deleteByQuery operation cannot coexist very well with other indexing
> operations.  Let me tell you about something I discovered.  I think your
> problem is very similar.
>
> Solr 4.0 and later is supposed to be able to handle indexing operations
> at the same time that the index is being optimized (in Lucene,
> forceMerge).  I have some indexes that take about two hours to optimize,
> so having indexing stop while that happens is a less than ideal
> situation.  Ongoing indexing is similar in many ways to a merge, enough
> that it is handled by the same Merge Scheduler that handles an optimize.
>
> I could indeed add documents to the index without issues at the same
> time as an optimize, but when I would try my full indexing cycle while
> an optimize was underway, I found that all operations stopped until the
> optimize finished.
>
> Ultimately what was determined (I think it was Yonik that figured it
> out) was that *most* indexing operations can happen during the optimize,
> *except* for deleteByQuery.  The deleteById operation works just fine.
>
> I do not understand the low-level reasons for this, but apparently it's
> not something that can be easily fixed.
>
> A workaround is to send the query you plan to use with deleteByQuery as
> a standard query with a limited fl parameter, to retrieve matching
> uniqueKey values from the index, then do a deleteById with that list of
> ID values instead.
>
> Thanks,
> Shawn
>


Re: recent utf8 problems

2017-11-07 Thread Chris Hostetter

: 1) When looking for Tübingen in the title, I am expecting the 3092484 

Just to be clear -- I'm reading that as an 8 character word, where the 2nd 
character is U+00FC and the other characters are plain ascii: T_bingen

Also to be clear: I'm attempting to reproduce the steps you describe using 
Solr 7.1, via "bin/solr -e techproducts"

I've indexed one additional document like so...

curl -H 'Content-Type: application/json' 
'http://localhost:8983/solr/techproducts/update?commit=true' --data-binary 
'[{"id":"HOSS","title":"Tübingen"}]'


: I should explain my test more clearly. We use a webbrowser (Firefox or 
: Chrome) to open the admin console of the search engine, which is at 
: http://localhost:8983/solr/#/mmc_search3/query 
:  on my local device. 
: This is the default behavior. In this webbrowser, I use the query 
: "title:T%C3%BCbingen” in the field “g” with /select as the 

If you type "title:T%C3%BCbingen" into the "q" param of the 
/solr/#/mmc_search3/query UI then you are *NOT* searching for an 8 
character word where the second character is U+00FC.

You are in fact searching for a 13 character word where the 2nd and 5h 
characters are the plain old ascii '%' -- the UI expects the *raw* string 
you wish to search for, and handles the URL encoding for you.

If you look at the solr logs when you hit the "Query" button after typing 
"title:T%C3%BCbingen" into the serach box, you should see this...

... webapp=/solr path=/select 
params={q=title:T%25C3%25BCbingen&_=1510074136657} ...

those are the *URL decoded* params

you should also see in the "response" portion of the UI, that the "params" 
contains...

"params":{
  "q":"title:T%C3%BCbingen",
  "_":"1510074136657"}},

That is, again, the URL decoded params.

Likewise, if i use the UI to change the "wt" to "python" the response now 
shows me...

'params':{
  'q':'title:T%C3%BCbingen',
  'wt':'python',
  '_':'1510074872875'}},

...there is no python unicode escaping here because there is none needed 
-- all of the characters in my 'q' param are plain old ascii characters

Solr doesn't know that you want to search for "LATIN SMALL LETTER U WITH 
DIAERESIS" -- it thinks you want to serach for "percent followed by C3 
followed by percent followed by BC"

Follow the steps you describe, in all of the above queries, i got 
numFound=0 ... but if i change the query i type in the UI to 
"title:Tübingen" (ie: type the plan unicode characters w/o attempting any 
special URL encoding myself) then everything works -- with the python 
output, note the unicode escape sequences...

'params':{
  'q':u'title:T\u00fcbingen',
  'wt':'python',
  '_':'1510074872875'}},
  'response':{'numFound':1,'start':0,'docs':[
  {
'id':'HOSS',
'title':[u'T\u00fcbingen'],
'_version_':1583428186725679104}]
  }}


And now what i ge in the solr.log...

... webapp=/solr path=/select 
params={q=title:Tübingen&wt=python&_=1510074872875} ...


Part of your confusion may be that some versions of some browsers try to 
be helpful by making urls "human readable" and hiding the fact that 
certain characters are actually being URL encoded.

for example -- your email contains the following verbatim text...

: 4) 
: 
http://localhost:8983/solr/mmc_search3/select?echoParams=all&q=title:Tübingen&wt=python
 
: 

 
: is displayed but it is 

Note that these to 2 "urls" -- one of which is in theory just the 
:linkable" version of the other -- are not equivilent.  Most likey 
because your email client tried to be helpful when you pasted a URL from 
your browser?  

>From what you've described, it's not actaully 100% clear what actual bytes 
you are seeing in the browser URL -- let alone what bytes your browser is 
actually sending to solr.



Based on your further comments, it appears that the reason you are getting 
results when you send the *literal* query for the word "T%C3%BCbingen"is 
because that's literally what you indexed for these documents.

Note the example document you showed when you get the "good" results 
(numFound=:3092484) with wt=python...

: 'params':{
:   'q':'title:T%C3%BCbingen',
:   'echoParams':'all',
:   'wt':'python',
:   '_':'1510024595963'}},
:   'response':{'numFound':3092484,'start':0,'docs':[
:   {
: 'photoid':'6182384834',
: 'hash':'7b201435fc5126accbfee6453b7fb181',
: 'userid':'48992104@N00',
: 'datetaken':'2011-09-04T13:19:16Z',
: 'dateuploaded':'2011-09-25T11:54:41Z',
: 'capturedevice':'NIKON COOLPIX S2500',
: 'title':'T%C3%BCbingen',



You are searching for the *literal* string 'T%C3%BCbingen' (containing 2 
percent symbols and no non-ascii characters) and you are finding it!  
Because that document also has the literal title of 'T%C3%BCbingen' 
(containing 2 percent sym

Re: Issues with Graphite reporter config

2017-11-07 Thread Cassandra Targett
I believe this is https://issues.apache.org/jira/browse/SOLR-11413,
which has a fix already slated for Solr 7.2.

On Tue, Nov 7, 2017 at 10:44 AM, sudershan madhavan
 wrote:
> Hi,
> I am running Solrcloud version: 6.6.1
> I have been trying to use graphite to report solr metrics and seem to get
> the below error while doing so in the solr logs:
>>
>> java.lang.NullPointerException
>> at
>> com.codahale.metrics.graphite.PickledGraphite.pickleMetrics(PickledGraphite.java:313)
>> at
>> com.codahale.metrics.graphite.PickledGraphite.writeMetrics(PickledGraphite.java:255)
>> at
>> com.codahale.metrics.graphite.PickledGraphite.send(PickledGraphite.java:213)
>> at
>> com.codahale.metrics.graphite.GraphiteReporter.reportGauge(GraphiteReporter.java:345)
>> at
>> com.codahale.metrics.graphite.GraphiteReporter.report(GraphiteReporter.java:243)
>> at
>> com.codahale.metrics.ScheduledReporter.report(ScheduledReporter.java:251)
>> at
>> com.codahale.metrics.ScheduledReporter$1.run(ScheduledReporter.java:174)
>> at
>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>> at
>> java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
>> at
>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>> at
>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>> at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>> at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>> at java.lang.Thread.run(Thread.java:745)
>> 2017-11-07 15:28:47.543 WARN  (metrics-graphite-reporter-3-thread-1) [   ]
>> c.c.m.g.GraphiteReporter Unable to report to Graphite
>> java.net.SocketException: Socket closed
>> at
>> java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:118)
>> at java.net.SocketOutputStream.write(SocketOutputStream.java:143)
>> at
>> com.codahale.metrics.graphite.PickledGraphite.writeMetrics(PickledGraphite.java:261)
>> at
>> com.codahale.metrics.graphite.PickledGraphite.send(PickledGraphite.java:213)
>> at
>> com.codahale.metrics.graphite.GraphiteReporter.sendIfEnabled(GraphiteReporter.java:328)
>> at
>> com.codahale.metrics.graphite.GraphiteReporter.reportTimer(GraphiteReporter.java:288)
>> at
>> com.codahale.metrics.graphite.GraphiteReporter.report(GraphiteReporter.java:259)
>> at
>> com.codahale.metrics.ScheduledReporter.report(ScheduledReporter.java:251)
>> at
>> com.codahale.metrics.ScheduledReporter$1.run(ScheduledReporter.java:174)
>> at
>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>> at
>> java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
>> at
>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>> at
>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>> at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>> at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>> at java.lang.Thread.run(Thread.java:745)
>> 2017-11-07 15:28:47.543 ERROR (metrics-graphite-reporter-1-thread-1) [   ]
>> c.c.m.ScheduledReporter Exception thrown from GraphiteReporter#report.
>> Exception was suppressed.
>> java.lang.NullPointerException
>> at java.util.LinkedList$ListItr.next(LinkedList.java:893)
>> at
>> com.codahale.metrics.graphite.PickledGraphite.pickleMetrics(PickledGraphite.java:305)
>> at
>> com.codahale.metrics.graphite.PickledGraphite.writeMetrics(PickledGraphite.java:255)
>> at
>> com.codahale.metrics.graphite.PickledGraphite.send(PickledGraphite.java:213)
>> at
>> com.codahale.metrics.graphite.GraphiteReporter.sendIfEnabled(GraphiteReporter.java:328)
>> at
>> com.codahale.metrics.graphite.GraphiteReporter.reportMetered(GraphiteReporter.java:304)
>> at
>> com.codahale.metrics.graphite.GraphiteReporter.report(GraphiteReporter.java:255)
>> at
>> com.codahale.metrics.ScheduledReporter.report(ScheduledReporter.java:251)
>> at
>> com.codahale.metrics.ScheduledReporter$1.run(ScheduledReporter.java:174)
>> at
>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>> at
>> java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
>> at
>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>> at
>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>> at
>> java.util.concurrent.Thread

Re: Long blocking during indexing + deleteByQuery

2017-11-07 Thread Shawn Heisey
On 11/5/2017 12:20 PM, Chris Troullis wrote:
> The issue I am seeing is when some
> threads are adding/updating documents while other threads are issuing
> deletes (using deleteByQuery), solr seems to get into a state of extreme
> blocking on the replica

The deleteByQuery operation cannot coexist very well with other indexing
operations.  Let me tell you about something I discovered.  I think your
problem is very similar.

Solr 4.0 and later is supposed to be able to handle indexing operations
at the same time that the index is being optimized (in Lucene,
forceMerge).  I have some indexes that take about two hours to optimize,
so having indexing stop while that happens is a less than ideal
situation.  Ongoing indexing is similar in many ways to a merge, enough
that it is handled by the same Merge Scheduler that handles an optimize.

I could indeed add documents to the index without issues at the same
time as an optimize, but when I would try my full indexing cycle while
an optimize was underway, I found that all operations stopped until the
optimize finished.

Ultimately what was determined (I think it was Yonik that figured it
out) was that *most* indexing operations can happen during the optimize,
*except* for deleteByQuery.  The deleteById operation works just fine.

I do not understand the low-level reasons for this, but apparently it's
not something that can be easily fixed.

A workaround is to send the query you plan to use with deleteByQuery as
a standard query with a limited fl parameter, to retrieve matching
uniqueKey values from the index, then do a deleteById with that list of
ID values instead.

Thanks,
Shawn



Issues with Graphite reporter config

2017-11-07 Thread sudershan madhavan
Hi,
I am running Solrcloud version: 6.6.1
I have been trying to use graphite to report solr metrics and seem to get
the below error while doing so in the solr logs:

> java.lang.NullPointerException
> at
> com.codahale.metrics.graphite.PickledGraphite.pickleMetrics(PickledGraphite.java:313)
> at
> com.codahale.metrics.graphite.PickledGraphite.writeMetrics(PickledGraphite.java:255)
> at
> com.codahale.metrics.graphite.PickledGraphite.send(PickledGraphite.java:213)
> at
> com.codahale.metrics.graphite.GraphiteReporter.reportGauge(GraphiteReporter.java:345)
> at
> com.codahale.metrics.graphite.GraphiteReporter.report(GraphiteReporter.java:243)
> at
> com.codahale.metrics.ScheduledReporter.report(ScheduledReporter.java:251)
> at
> com.codahale.metrics.ScheduledReporter$1.run(ScheduledReporter.java:174)
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> 2017-11-07 15:28:47.543 WARN  (metrics-graphite-reporter-3-thread-1) [   ]
> c.c.m.g.GraphiteReporter Unable to report to Graphite
> java.net.SocketException: Socket closed
> at
> java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:118)
> at java.net.SocketOutputStream.write(SocketOutputStream.java:143)
> at
> com.codahale.metrics.graphite.PickledGraphite.writeMetrics(PickledGraphite.java:261)
> at
> com.codahale.metrics.graphite.PickledGraphite.send(PickledGraphite.java:213)
> at
> com.codahale.metrics.graphite.GraphiteReporter.sendIfEnabled(GraphiteReporter.java:328)
> at
> com.codahale.metrics.graphite.GraphiteReporter.reportTimer(GraphiteReporter.java:288)
> at
> com.codahale.metrics.graphite.GraphiteReporter.report(GraphiteReporter.java:259)
> at
> com.codahale.metrics.ScheduledReporter.report(ScheduledReporter.java:251)
> at
> com.codahale.metrics.ScheduledReporter$1.run(ScheduledReporter.java:174)
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> 2017-11-07 15:28:47.543 ERROR (metrics-graphite-reporter-1-thread-1) [   ]
> c.c.m.ScheduledReporter Exception thrown from GraphiteReporter#report.
> Exception was suppressed.
> java.lang.NullPointerException
> at java.util.LinkedList$ListItr.next(LinkedList.java:893)
> at
> com.codahale.metrics.graphite.PickledGraphite.pickleMetrics(PickledGraphite.java:305)
> at
> com.codahale.metrics.graphite.PickledGraphite.writeMetrics(PickledGraphite.java:255)
> at
> com.codahale.metrics.graphite.PickledGraphite.send(PickledGraphite.java:213)
> at
> com.codahale.metrics.graphite.GraphiteReporter.sendIfEnabled(GraphiteReporter.java:328)
> at
> com.codahale.metrics.graphite.GraphiteReporter.reportMetered(GraphiteReporter.java:304)
> at
> com.codahale.metrics.graphite.GraphiteReporter.report(GraphiteReporter.java:255)
> at
> com.codahale.metrics.ScheduledReporter.report(ScheduledReporter.java:251)
> at
> com.codahale.metrics.ScheduledReporter$1.run(ScheduledReporter.java:174)
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> (END)


Kindly let me know if this needs to be reported as a bug.







  


Re: Can someone help? Two level nested doc... ChildDocTransformerFactory sytax...

2017-11-07 Thread Petersen, Robert (Contr)
OK although this was talked about as possibly coming in solr 6.x I guess it was 
hearsay and from what I can tell after rereading everythying I can find on the 
subject as of now the child docs are only retrievable as a one level hierarchy 
when using the ChildDocTransformerFactory




From: Petersen, Robert (Contr) 
Sent: Monday, November 6, 2017 5:05:31 PM
To: solr-user@lucene.apache.org
Subject: Can someone help? Two level nested doc... ChildDocTransformerFactory 
sytax...

OK no faceting, no filtering, I just want the hierarchy to come backin the 
results. Can't quite get it... googled all over the place too.


Doc:

{ id : asdf, type_s:customer, firstName_s:Manny, lastName_s:Acevedo, 
address_s:"123 Fourth Street", city_s:Gotham, tn_s:1234561234,
  _childDocuments_:[
  { id : adsf_c1,
src_s : "CRM.Customer",
type_s:customerSource,
_childDocuments_:[
{
id : asdf_c1_c1,
type_s:customerSourceType,
"key_s": "id",
"value_s": "GUID"
}
]
},
  { id : adsf_c2,
"src_s": "DPI.SalesOrder",
type_s:customerSource,
_childDocuments_:[
{
id : asdf_c2_c1,
type_s:customerSourceType,
"key_s": "btn",
"value_s": "4052328908"
},
{
id : asdf_c2_c2,
type_s:customerSourceType,
"key_s": "seq",
"value_s": "5"
   },
{
id : asdf_c2_c3,
type_s:customerSourceType,
"key_s": "env",
"value_s": "MS"
}
]
}
]
}


Queries:

Gives all nested docs regardless of level as a flat set
http://localhost:8983/solr/temptest/select?q=id:asdf&fl=id,[child%20parentFilter=type_s:customer]

Gives all nested child docs only
http://localhost:8983/solr/temptest/select?q=id:asdf&fl=id,[child%20parentFilter=type_s:customer%20childFilter=type_s:customerSource]

How to get nested grandchild docs at correct level?
Nope exception:
http://localhost:8983/solr/temptest/select?q=id:asdf&fl=id,[child%20parentFilter=type_s:customer%20childFilter=type_s:customerSource],[child%20parentFilter=type_s:customerSource%20childFilter=type_s:customerSourceType]

Nope exception:
http://localhost:8983/solr/temptest/select?q=id:asdf&fl=id,[child%20parentFilter=type_s:customer%20childFilter=type_s:customerSource],[child%20parentFilter=type_s:customerSource]


Nope but no exception only gets children again tho like above:
http://localhost:8983/solr/temptest/select?q=id:asdf&fl=id,[child%20parentFilter=type_s:customer%20childFilter=type_s:customerSource],[child%20parentFilter=type_s:customer*]

Nope but no exception only gets children 
again:

http://localhost:8983/solr/temptest/select?q=id:asdf&fl=id,[child%20parentFilter=type_s:customer%20childFilter=type_s:customerSource],[child%20parentFilter=type_s:customer*%20childFilter=type_s:customerSourceType]


Nope same again... no grandchildren:

http://localhost:8983/solr/temptest/select?q=id:asdf&fl=id,p:[child%20parentFilter=type_s:customer%20childFilter=type_s:customerSource],q:[child%20parentFilter=-type_s:customer%20parentFilter=type_s:customerSource%20childFilter=type_s:customerSourceType]


Gives all but flat no child to grandchild hierarchy:

http://localhost:8983/solr/temptest/select?q=id:asdf&fl=id,p:[child%20parentFilter=type_s:customer%20childFilter=type_s:customerSource],q:[child%20parentFilter=type_s:customer%20childFilter=type_s:customerSourceType]


Thanks in advance,

Robi



This communication is confidential. Frontier only sends and receives email on 
the basis of the terms set out at http://www.frontier.com/email_disclaimer.


Re: Long blocking during indexing + deleteByQuery

2017-11-07 Thread Chris Troullis
If I am understanding you correctly, you think it is caused by the DBQ
deleting a document while a document with that same ID is being updated by
another thread? I'm not sure that is what is happening here, as we only
delete docs if they no longer exist in the DB, so nothing should be
adding/updating a doc with that ID if it is marked for deletion, as we
don't reuse IDs. I will double check though to confirm.

Also, not sure if relevant, but the DBQ itself returns very quickly, in a
matter of ms, it's the updates that block for a huge amount of time.

On Tue, Nov 7, 2017 at 11:08 AM, Amrit Sarkar 
wrote:

> Maybe not a relevant fact on this, but: "addAndDelete" is triggered by
> "*Reordering
> of DBQs'; *that means there are non-executed DBQs present in the updateLog
> and an add operation is also received. Solr makes sure DBQs are executed
> first and than add operation is executed.
>
> Amrit Sarkar
> Search Engineer
> Lucidworks, Inc.
> 415-589-9269
> www.lucidworks.com
> Twitter http://twitter.com/lucidworks
> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
> Medium: https://medium.com/@sarkaramrit2
>
> On Tue, Nov 7, 2017 at 9:19 PM, Erick Erickson 
> wrote:
>
> > Well, consider what happens here.
> >
> > Solr gets a DBQ that includes document 132 and 10,000,000 other docs
> > Solr gets an add for document 132
> >
> > The DBQ takes time to execute. If it was processing the requests in
> > parallel would 132 be in the index after the delete was over? It would
> > depend on when the DBQ found the doc relative to the add.
> > With this sequence one would expect 132 to be in the index at the end.
> >
> > And it's worse when it comes to distributed indexes. If the updates
> > were sent out in parallel you could end up in situations where one
> > replica contained 132 and another didn't depending on the vagaries of
> > thread execution.
> >
> > Now I didn't write the DBQ code, but that's what I think is happening.
> >
> > Best,
> > Erick
> >
> > On Tue, Nov 7, 2017 at 7:40 AM, Chris Troullis 
> > wrote:
> > > As an update, I have confirmed that it doesn't seem to have anything to
> > do
> > > with child documents, or standard deletes, just deleteByQuery. If I do
> a
> > > deleteByQuery on any collection while also adding/updating in separate
> > > threads I am experiencing this blocking behavior on the non-leader
> > replica.
> > >
> > > Has anyone else experienced this/have any thoughts on what to try?
> > >
> > > On Sun, Nov 5, 2017 at 2:20 PM, Chris Troullis 
> > wrote:
> > >
> > >> Hi,
> > >>
> > >> I am experiencing an issue where threads are blocking for an extremely
> > >> long time when I am indexing while deleteByQuery is also running.
> > >>
> > >> Setup info:
> > >> -Solr Cloud 6.6.0
> > >> -Simple 2 Node, 1 Shard, 2 replica setup
> > >> -~12 million docs in the collection in question
> > >> -Nodes have 64 GB RAM, 8 CPUs, spinning disks
> > >> -Soft commit interval 10 seconds, Hard commit (open searcher false) 60
> > >> seconds
> > >> -Default merge policy settings (Which I think is 10/10).
> > >>
> > >> We have a query heavy index heavyish use case. Indexing is constantly
> > >> running throughout the day and can be bursty. The indexing process
> > handles
> > >> both updates and deletes, can spin up to 15 simultaneous threads, and
> > sends
> > >> to solr in batches of 3000 (seems to be the optimal number per trial
> and
> > >> error).
> > >>
> > >> I can build the entire collection from scratch using this method in <
> 40
> > >> mins and indexing is in general super fast (averages about 3 seconds
> to
> > >> send a batch of 3000 docs to solr). The issue I am seeing is when some
> > >> threads are adding/updating documents while other threads are issuing
> > >> deletes (using deleteByQuery), solr seems to get into a state of
> extreme
> > >> blocking on the replica, which results in some threads taking 30+
> > minutes
> > >> just to send 1 batch of 3000 docs. This collection does use child
> > documents
> > >> (hence the delete by query _root_), not sure if that makes a
> > difference, I
> > >> am trying to duplicate on a non-child doc collection. CPU/IO wait
> seems
> > >> minimal on both nodes, so not sure what is causing the blocking.
> > >>
> > >> Here is part of the stack trace on one of the blocked threads on the
> > >> replica:
> > >>
> > >> qtp592179046-576 (576)
> > >> java.lang.Object@608fe9b5
> > >> org.apache.solr.update.DirectUpdateHandler2.addAndDelete(
> > >> DirectUpdateHandler2.java:354)
> > >> org.apache.solr.update.DirectUpdateHandler2.addDoc0(
> > >> DirectUpdateHandler2.java:237)
> > >> org.apache.solr.update.DirectUpdateHandler2.addDoc(
> > >> DirectUpdateHandler2.java:194)
> > >> org.apache.solr.update.processor.RunUpdateProcessor.processAdd(
> > >> RunUpdateProcessorFactory.java:67)
> > >> org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(
> > >> UpdateRequestProcessor.java:55)
> > >> org.apache.solr.update.processor.DistributedUpdateProcessor.
> doLoc

Re: Long blocking during indexing + deleteByQuery

2017-11-07 Thread Amrit Sarkar
Maybe not a relevant fact on this, but: "addAndDelete" is triggered by
"*Reordering
of DBQs'; *that means there are non-executed DBQs present in the updateLog
and an add operation is also received. Solr makes sure DBQs are executed
first and than add operation is executed.

Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2
Medium: https://medium.com/@sarkaramrit2

On Tue, Nov 7, 2017 at 9:19 PM, Erick Erickson 
wrote:

> Well, consider what happens here.
>
> Solr gets a DBQ that includes document 132 and 10,000,000 other docs
> Solr gets an add for document 132
>
> The DBQ takes time to execute. If it was processing the requests in
> parallel would 132 be in the index after the delete was over? It would
> depend on when the DBQ found the doc relative to the add.
> With this sequence one would expect 132 to be in the index at the end.
>
> And it's worse when it comes to distributed indexes. If the updates
> were sent out in parallel you could end up in situations where one
> replica contained 132 and another didn't depending on the vagaries of
> thread execution.
>
> Now I didn't write the DBQ code, but that's what I think is happening.
>
> Best,
> Erick
>
> On Tue, Nov 7, 2017 at 7:40 AM, Chris Troullis 
> wrote:
> > As an update, I have confirmed that it doesn't seem to have anything to
> do
> > with child documents, or standard deletes, just deleteByQuery. If I do a
> > deleteByQuery on any collection while also adding/updating in separate
> > threads I am experiencing this blocking behavior on the non-leader
> replica.
> >
> > Has anyone else experienced this/have any thoughts on what to try?
> >
> > On Sun, Nov 5, 2017 at 2:20 PM, Chris Troullis 
> wrote:
> >
> >> Hi,
> >>
> >> I am experiencing an issue where threads are blocking for an extremely
> >> long time when I am indexing while deleteByQuery is also running.
> >>
> >> Setup info:
> >> -Solr Cloud 6.6.0
> >> -Simple 2 Node, 1 Shard, 2 replica setup
> >> -~12 million docs in the collection in question
> >> -Nodes have 64 GB RAM, 8 CPUs, spinning disks
> >> -Soft commit interval 10 seconds, Hard commit (open searcher false) 60
> >> seconds
> >> -Default merge policy settings (Which I think is 10/10).
> >>
> >> We have a query heavy index heavyish use case. Indexing is constantly
> >> running throughout the day and can be bursty. The indexing process
> handles
> >> both updates and deletes, can spin up to 15 simultaneous threads, and
> sends
> >> to solr in batches of 3000 (seems to be the optimal number per trial and
> >> error).
> >>
> >> I can build the entire collection from scratch using this method in < 40
> >> mins and indexing is in general super fast (averages about 3 seconds to
> >> send a batch of 3000 docs to solr). The issue I am seeing is when some
> >> threads are adding/updating documents while other threads are issuing
> >> deletes (using deleteByQuery), solr seems to get into a state of extreme
> >> blocking on the replica, which results in some threads taking 30+
> minutes
> >> just to send 1 batch of 3000 docs. This collection does use child
> documents
> >> (hence the delete by query _root_), not sure if that makes a
> difference, I
> >> am trying to duplicate on a non-child doc collection. CPU/IO wait seems
> >> minimal on both nodes, so not sure what is causing the blocking.
> >>
> >> Here is part of the stack trace on one of the blocked threads on the
> >> replica:
> >>
> >> qtp592179046-576 (576)
> >> java.lang.Object@608fe9b5
> >> org.apache.solr.update.DirectUpdateHandler2.addAndDelete(
> >> DirectUpdateHandler2.java:354)
> >> org.apache.solr.update.DirectUpdateHandler2.addDoc0(
> >> DirectUpdateHandler2.java:237)
> >> org.apache.solr.update.DirectUpdateHandler2.addDoc(
> >> DirectUpdateHandler2.java:194)
> >> org.apache.solr.update.processor.RunUpdateProcessor.processAdd(
> >> RunUpdateProcessorFactory.java:67)
> >> org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(
> >> UpdateRequestProcessor.java:55)
> >> org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(
> >> DistributedUpdateProcessor.java:979)
> >> org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(
> >> DistributedUpdateProcessor.java:1192)
> >> org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(
> >> DistributedUpdateProcessor.java:748)
> >> org.apache.solr.handler.loader.JavabinLoader$1.update
> >> (JavabinLoader.java:98)
> >> org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.
> >> readOuterMostDocIterator(JavaBinUpdateRequestCodec.java:180)
> >> org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.
> >> readIterator(JavaBinUpdateRequestCodec.java:136)
> >> org.apache.solr.common.util.JavaBinCodec.readObject(
> >> JavaBinCodec.java:306)
> >> org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:251)
> >> org.apache.solr.client.solr

App Studio - Preview Release

2017-11-07 Thread Sushilkumar Deshmukh
Hi Team,

I heard news that App Studio for Solr and ElasticSearch being released by
Lucidworks in November. In the mail it was mentioned to drop mail to
solr-user DL if we like to get a preview release.

Could you please share preview release with me.

Thanks,
Sushil


Re: use mutiple ssd in solr cloud

2017-11-07 Thread Erick Erickson
There's been discussion on the Solr JIRA list about allowing multiple
"roots" for cores although I can't find it right now.

Meanwhile, what people do is specify dataDir. It's a bit clumsy since
we can't really do this at a collection level it needs to be done with
ADDERPLICA individually.

Best,
Erick

On Tue, Nov 7, 2017 at 7:43 AM, simon  wrote:
> I don't think there's any way to do that within Solr. If you're using
> Linux, the Logical Volume Manager can be used to create a single volume
> from multiple devices (RAID), from which you can create partitions/file
> systems as required. There may be equivalent Windows functionality - I
> can't say.
>
> best
>
> -Simon
>
> On Tue, Nov 7, 2017 at 1:44 AM, Amin Raeiszadeh 
> wrote:
>
>> Hi
>> i want to use more than one ssd in each server of solr cluster but i don't
>> know how to set multiple hdd in solr.xml configurations.
>> i set on hdd path in solr.xml by:
>> /media/ssd
>> but i can't set more than one ssd.
>> how should i do it.
>> thanks.
>>


Re: Long blocking during indexing + deleteByQuery

2017-11-07 Thread Erick Erickson
Well, consider what happens here.

Solr gets a DBQ that includes document 132 and 10,000,000 other docs
Solr gets an add for document 132

The DBQ takes time to execute. If it was processing the requests in
parallel would 132 be in the index after the delete was over? It would
depend on when the DBQ found the doc relative to the add.
With this sequence one would expect 132 to be in the index at the end.

And it's worse when it comes to distributed indexes. If the updates
were sent out in parallel you could end up in situations where one
replica contained 132 and another didn't depending on the vagaries of
thread execution.

Now I didn't write the DBQ code, but that's what I think is happening.

Best,
Erick

On Tue, Nov 7, 2017 at 7:40 AM, Chris Troullis  wrote:
> As an update, I have confirmed that it doesn't seem to have anything to do
> with child documents, or standard deletes, just deleteByQuery. If I do a
> deleteByQuery on any collection while also adding/updating in separate
> threads I am experiencing this blocking behavior on the non-leader replica.
>
> Has anyone else experienced this/have any thoughts on what to try?
>
> On Sun, Nov 5, 2017 at 2:20 PM, Chris Troullis  wrote:
>
>> Hi,
>>
>> I am experiencing an issue where threads are blocking for an extremely
>> long time when I am indexing while deleteByQuery is also running.
>>
>> Setup info:
>> -Solr Cloud 6.6.0
>> -Simple 2 Node, 1 Shard, 2 replica setup
>> -~12 million docs in the collection in question
>> -Nodes have 64 GB RAM, 8 CPUs, spinning disks
>> -Soft commit interval 10 seconds, Hard commit (open searcher false) 60
>> seconds
>> -Default merge policy settings (Which I think is 10/10).
>>
>> We have a query heavy index heavyish use case. Indexing is constantly
>> running throughout the day and can be bursty. The indexing process handles
>> both updates and deletes, can spin up to 15 simultaneous threads, and sends
>> to solr in batches of 3000 (seems to be the optimal number per trial and
>> error).
>>
>> I can build the entire collection from scratch using this method in < 40
>> mins and indexing is in general super fast (averages about 3 seconds to
>> send a batch of 3000 docs to solr). The issue I am seeing is when some
>> threads are adding/updating documents while other threads are issuing
>> deletes (using deleteByQuery), solr seems to get into a state of extreme
>> blocking on the replica, which results in some threads taking 30+ minutes
>> just to send 1 batch of 3000 docs. This collection does use child documents
>> (hence the delete by query _root_), not sure if that makes a difference, I
>> am trying to duplicate on a non-child doc collection. CPU/IO wait seems
>> minimal on both nodes, so not sure what is causing the blocking.
>>
>> Here is part of the stack trace on one of the blocked threads on the
>> replica:
>>
>> qtp592179046-576 (576)
>> java.lang.Object@608fe9b5
>> org.apache.solr.update.DirectUpdateHandler2.addAndDelete(
>> DirectUpdateHandler2.java:354)
>> org.apache.solr.update.DirectUpdateHandler2.addDoc0(
>> DirectUpdateHandler2.java:237)
>> org.apache.solr.update.DirectUpdateHandler2.addDoc(
>> DirectUpdateHandler2.java:194)
>> org.apache.solr.update.processor.RunUpdateProcessor.processAdd(
>> RunUpdateProcessorFactory.java:67)
>> org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(
>> UpdateRequestProcessor.java:55)
>> org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(
>> DistributedUpdateProcessor.java:979)
>> org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(
>> DistributedUpdateProcessor.java:1192)
>> org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(
>> DistributedUpdateProcessor.java:748)
>> org.apache.solr.handler.loader.JavabinLoader$1.update
>> (JavabinLoader.java:98)
>> org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.
>> readOuterMostDocIterator(JavaBinUpdateRequestCodec.java:180)
>> org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.
>> readIterator(JavaBinUpdateRequestCodec.java:136)
>> org.apache.solr.common.util.JavaBinCodec.readObject(
>> JavaBinCodec.java:306)
>> org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:251)
>> org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.
>> readNamedList(JavaBinUpdateRequestCodec.java:122)
>> org.apache.solr.common.util.JavaBinCodec.readObject(
>> JavaBinCodec.java:271)
>> org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:251)
>> org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:173)
>> org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec.unmarshal(
>> JavaBinUpdateRequestCodec.java:187)
>> org.apache.solr.handler.loader.JavabinLoader.parseAndLoadDocs(
>> JavabinLoader.java:108)
>> org.apache.solr.handler.loader.JavabinLoader.load(JavabinLoader.java:55)
>> org.apache.solr.handler.UpdateRequestHandler$1.load(
>> UpdateRequestHandler.java:97)
>> org.apache.solr.handler.ContentSt

Re: use mutiple ssd in solr cloud

2017-11-07 Thread simon
I don't think there's any way to do that within Solr. If you're using
Linux, the Logical Volume Manager can be used to create a single volume
from multiple devices (RAID), from which you can create partitions/file
systems as required. There may be equivalent Windows functionality - I
can't say.

best

-Simon

On Tue, Nov 7, 2017 at 1:44 AM, Amin Raeiszadeh 
wrote:

> Hi
> i want to use more than one ssd in each server of solr cluster but i don't
> know how to set multiple hdd in solr.xml configurations.
> i set on hdd path in solr.xml by:
> /media/ssd
> but i can't set more than one ssd.
> how should i do it.
> thanks.
>


Re: Faceting Word Count

2017-11-07 Thread Erick Erickson
bq: 10k as a max number of rows.

This doesn't matter. In order to facet on the word count, Solr has to
be prepared to facet on all possible docs. For all Solr knows, a
_single_ document may contain every word so the size of the structure
that contains the counters has to be prepared for N buckets, where N
is the total number of distinct words in the entire corpus.

You'll really have to find an alternative approach, somehow restrict
the choices etc. I think.

Best,
Erick

On Tue, Nov 7, 2017 at 12:26 AM, Wael Kader  wrote:
> Hi,
>
> The whole index has 100M but when I add the criteria, it will filter the
> data to maybe 10k as a max number of rows.
> The facet isn't working when the total number of records in the index is
> 100M but it was working at 5M.
>
> I have social media & RSS data in the index and I am trying to get the word
> count for a specific user on specific date intervals.
>
> Regards,
> Wael
>
> On Mon, Nov 6, 2017 at 3:42 PM, Erick Erickson 
> wrote:
>
>> _Why_ do you want to get the word counts? Faceting on all of the
>> tokens for 100M docs isn't something Solr is ordinarily used for. As
>> Emir says it'll take a huge amount of memory. You can use one of the
>> function queries (termfreq IIRC) that will give you the count of any
>> individual term you have and will be very fast.
>>
>> But getting all of the word counts in the index is probably not
>> something I'd use Solr for.
>>
>> This may be an XY problem, you're asking how to do something specific
>> (X) without explaining what the problem you're trying to solve is (Y).
>> Perhaps there's another way to accomplish (Y) if we knew more about
>> what it is.
>>
>> Best,
>> Erick
>>
>>
>>
>> On Mon, Nov 6, 2017 at 4:15 AM, Emir Arnautović
>>  wrote:
>> > Hi Wael,
>> > You are faceting on analyzed field. This results in field being
>> uninverted - fieldValueCache being built - on first call after every
>> commit. This is both time and memory consuming (you can check in admin
>> console in stats how much memory it took).
>> > What you need to do is to create multivalue string field (not text) and
>> parse values (do analysis steps) on client side and store it like that.
>> This will allow you to enable docValues on that field and avoid building
>> fieldValueCache.
>> >
>> > HTH,
>> > Emir
>> > --
>> > Monitoring - Log Management - Alerting - Anomaly Detection
>> > Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>> >
>> >
>> >
>> >> On 6 Nov 2017, at 13:06, Wael Kader  wrote:
>> >>
>> >> Hi,
>> >>
>> >> I am using a custom field. Below is the field definition.
>> >> I am using this because I don't want stemming.
>> >>
>> >>
>> >>> >> positionIncrementGap="100">
>> >>  
>> >>> >> mapping="mapping-ISOLatin1Accent.txt"/>
>> >>
>> >>
>> >>> >>ignoreCase="true"
>> >>words="stopwords.txt"
>> >>enablePositionIncrements="true"
>> >>/>
>> >>> >>protected="protwords.txt"
>> >>generateWordParts="0"
>> >>generateNumberParts="1"
>> >>catenateWords="1"
>> >>catenateNumbers="1"
>> >>catenateAll="0"
>> >>splitOnCaseChange="1"
>> >>preserveOriginal="1"/>
>> >>
>> >>
>> >>
>> >>  
>> >>  
>> >>> >> mapping="mapping-ISOLatin1Accent.txt"/>
>> >>
>> >>> synonyms="synonyms.txt"
>> >> ignoreCase="true" expand="true"/>
>> >>> >>ignoreCase="true"
>> >>words="stopwords.txt"
>> >>enablePositionIncrements="true"
>> >>/>
>> >> 
>> >>> >>protected="protwords.txt"
>> >>generateWordParts="0"
>> >>catenateWords="0"
>> >>catenateNumbers="0"
>> >>catenateAll="0"
>> >>splitOnCaseChange="1"
>> >>preserveOriginal="1"/>
>> >>
>> >>
>> >>
>> >>
>> >>  
>> >>
>> >>
>> >>
>> >> Regards,
>> >> Wael
>> >>
>> >> On Mon, Nov 6, 2017 at 10:29 AM, Emir Arnautović <
>> >> emir.arnauto...@sematext.com> wrote:
>> >>
>> >>> Hi Wael,
>> >>> Can you provide your field definition and sample query.
>> >>>
>> >>> Thanks,
>> >>> Emir
>> >>> --
>> >>> Monitoring - Log Management - Alerting - Anomaly Detection
>> >>> Solr & Elasticsearch Consulting Support Training -
>> http://sematext.com/
>> >>>
>> >>>
>> >>>
>>  On 6 Nov 2017, at 08:30, Wael Kader  wrote:
>> 
>>  Hello,
>> 
>>  I am having an index with around 100 Million documents.
>>  I have a multivalued column that I am saving big chunks of text data
>> in.
>> >>> It
>>  has around 20 GB of RAM and 4 CPU's.
>> 
>>  I was doing faceting on it to get word cloud but it was taking around
>> 1
>>  second to retrieve when the data was 5-10 Million .
>>  Now I have more data and 

Re: Long blocking during indexing + deleteByQuery

2017-11-07 Thread Chris Troullis
As an update, I have confirmed that it doesn't seem to have anything to do
with child documents, or standard deletes, just deleteByQuery. If I do a
deleteByQuery on any collection while also adding/updating in separate
threads I am experiencing this blocking behavior on the non-leader replica.

Has anyone else experienced this/have any thoughts on what to try?

On Sun, Nov 5, 2017 at 2:20 PM, Chris Troullis  wrote:

> Hi,
>
> I am experiencing an issue where threads are blocking for an extremely
> long time when I am indexing while deleteByQuery is also running.
>
> Setup info:
> -Solr Cloud 6.6.0
> -Simple 2 Node, 1 Shard, 2 replica setup
> -~12 million docs in the collection in question
> -Nodes have 64 GB RAM, 8 CPUs, spinning disks
> -Soft commit interval 10 seconds, Hard commit (open searcher false) 60
> seconds
> -Default merge policy settings (Which I think is 10/10).
>
> We have a query heavy index heavyish use case. Indexing is constantly
> running throughout the day and can be bursty. The indexing process handles
> both updates and deletes, can spin up to 15 simultaneous threads, and sends
> to solr in batches of 3000 (seems to be the optimal number per trial and
> error).
>
> I can build the entire collection from scratch using this method in < 40
> mins and indexing is in general super fast (averages about 3 seconds to
> send a batch of 3000 docs to solr). The issue I am seeing is when some
> threads are adding/updating documents while other threads are issuing
> deletes (using deleteByQuery), solr seems to get into a state of extreme
> blocking on the replica, which results in some threads taking 30+ minutes
> just to send 1 batch of 3000 docs. This collection does use child documents
> (hence the delete by query _root_), not sure if that makes a difference, I
> am trying to duplicate on a non-child doc collection. CPU/IO wait seems
> minimal on both nodes, so not sure what is causing the blocking.
>
> Here is part of the stack trace on one of the blocked threads on the
> replica:
>
> qtp592179046-576 (576)
> java.lang.Object@608fe9b5
> org.apache.solr.update.DirectUpdateHandler2.addAndDelete​(
> DirectUpdateHandler2.java:354)
> org.apache.solr.update.DirectUpdateHandler2.addDoc0​(
> DirectUpdateHandler2.java:237)
> org.apache.solr.update.DirectUpdateHandler2.addDoc​(
> DirectUpdateHandler2.java:194)
> org.apache.solr.update.processor.RunUpdateProcessor.processAdd​(
> RunUpdateProcessorFactory.java:67)
> org.apache.solr.update.processor.UpdateRequestProcessor.processAdd​(
> UpdateRequestProcessor.java:55)
> org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd​(
> DistributedUpdateProcessor.java:979)
> org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd​(
> DistributedUpdateProcessor.java:1192)
> org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd​(
> DistributedUpdateProcessor.java:748)
> org.apache.solr.handler.loader.JavabinLoader$1.update​
> (JavabinLoader.java:98)
> org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.
> readOuterMostDocIterator​(JavaBinUpdateRequestCodec.java:180)
> org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.
> readIterator​(JavaBinUpdateRequestCodec.java:136)
> org.apache.solr.common.util.JavaBinCodec.readObject​(
> JavaBinCodec.java:306)
> org.apache.solr.common.util.JavaBinCodec.readVal​(JavaBinCodec.java:251)
> org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.
> readNamedList​(JavaBinUpdateRequestCodec.java:122)
> org.apache.solr.common.util.JavaBinCodec.readObject​(
> JavaBinCodec.java:271)
> org.apache.solr.common.util.JavaBinCodec.readVal​(JavaBinCodec.java:251)
> org.apache.solr.common.util.JavaBinCodec.unmarshal​(JavaBinCodec.java:173)
> org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec.unmarshal​(
> JavaBinUpdateRequestCodec.java:187)
> org.apache.solr.handler.loader.JavabinLoader.parseAndLoadDocs​(
> JavabinLoader.java:108)
> org.apache.solr.handler.loader.JavabinLoader.load​(JavabinLoader.java:55)
> org.apache.solr.handler.UpdateRequestHandler$1.load​(
> UpdateRequestHandler.java:97)
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody​(
> ContentStreamHandlerBase.java:68)
> org.apache.solr.handler.RequestHandlerBase.handleRequest​(
> RequestHandlerBase.java:173)
> org.apache.solr.core.SolrCore.execute​(SolrCore.java:2477)
> org.apache.solr.servlet.HttpSolrCall.execute​(HttpSolrCall.java:723)
> org.apache.solr.servlet.HttpSolrCall.call​(HttpSolrCall.java:529)
>
> A cursory search lead me to this JIRA https://issues.apache.
> org/jira/browse/SOLR-7836, not sure if related though.
>
> Can anyone shed some light on this issue? We don't do deletes very
> frequently, but it is bringing solr to it's knees when we do, which is
> causing some big problems.
>
> Thanks,
>
> Chris
>


Re: Solr snakeyaml Error Problem

2017-11-07 Thread Erick Erickson
Caused by: java.lang.ClassNotFoundException: org.yaml.snakeyaml.Yaml

You haven't included anything that tells Solr where that file is. You've
included
  

but that specifically loads the jar file. Try a regex pattern assuming
snakeyaml.Yaml is co-located with cassandra-jdbc-driver-0.6.4.jar


Best,
Erick.

On Tue, Nov 7, 2017 at 5:17 AM, Can Ezgi Aydemir 
wrote:

> Hi everybody,
>
>
>
> I am trying Cassandra solr integration. I configured solr files;
>  dataconfig.xml, solrconfig.xml and managed-schema. But solr does not
> connect Cassandra and snakeyaml error which  is;
>
>
>
> Exception in thread "Thread-18" java.lang.NoClassDefFoundError:
> org/yaml/snakeyaml/Yaml
>
> at com.github.cassandra.jdbc.CassandraConfiguration.(
> CassandraConfiguration.java:167)
>
> at com.github.cassandra.jdbc.CassandraDriver.acceptsURL(
> CassandraDriver.java:103)
>
> at com.github.cassandra.jdbc.CassandraDriver.connect(
> CassandraDriver.java:107)
>
> at java.sql.DriverManager.getConnection(DriverManager.java:664)
>
> at java.sql.DriverManager.getConnection(DriverManager.java:208)
>
>at org.apache.solr.handler.dataimport.JdbcDataSource$1.
> call(JdbcDataSource.java:185)
>
> at org.apache.solr.handler.dataimport.JdbcDataSource$1.
> call(JdbcDataSource.java:172)
>
> at org.apache.solr.handler.dataimport.JdbcDataSource.
> getConnection(JdbcDataSource.java:528)
>
> at org.apache.solr.handler.dataimport.JdbcDataSource$
> ResultSetIterator.(JdbcDataSource.java:317)
>
> at org.apache.solr.handler.dataimport.JdbcDataSource.
> createResultSetIterator(JdbcDataSource.java:288)
>
> at org.apache.solr.handler.dataimport.JdbcDataSource.
> getData(JdbcDataSource.java:283)
>
> at org.apache.solr.handler.dataimport.JdbcDataSource.
> getData(JdbcDataSource.java:52)
>
> at org.apache.solr.handler.dataimport.SqlEntityProcessor.
> initQuery(SqlEntityProcessor.java:59)
>
> at org.apache.solr.handler.dataimport.SqlEntityProcessor.
> nextRow(SqlEntityProcessor.java:73)
>
> at org.apache.solr.handler.dataimport.EntityProcessorWrapper.
> nextRow(EntityProcessorWrapper.java:267)
>
> at org.apache.solr.handler.dataimport.DocBuilder.
> buildDocument(DocBuilder.java:476)
>
> at org.apache.solr.handler.dataimport.DocBuilder.
> buildDocument(DocBuilder.java:415)
>
> at org.apache.solr.handler.dataimport.DocBuilder.
> doFullDump(DocBuilder.java:330)
>
> at org.apache.solr.handler.dataimport.DocBuilder.execute(
> DocBuilder.java:233)
>
> at org.apache.solr.handler.dataimport.DataImporter.
> doFullImport(DataImporter.java:415)
>
> at org.apache.solr.handler.dataimport.DataImporter.
> runCmd(DataImporter.java:474)
>
> at org.apache.solr.handler.dataimport.DataImporter.
> lambda$runAsync$0(DataImporter.java:457)
>
> at java.lang.Thread.run(Thread.java:748)
>
> Caused by: java.lang.ClassNotFoundException: org.yaml.snakeyaml.Yaml
>
> at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
>
> at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>
> at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>
> at org.eclipse.jetty.webapp.WebAppClassLoader.loadClass(
> WebAppClassLoader.java:487)
>
> at org.eclipse.jetty.webapp.WebAppClassLoader.loadClass(
> WebAppClassLoader.java:428)
>
>
>
>
>
> Dataconfig file;
>
>
>
> 
>
>  driver="com.github.cassandra.jdbc.CassandraDriver"
> url="jdbc:cassandra://192.168.1.19:9160/activitymanager"
> autoCommit="true"/>
>
> 
>
>  autoCommit="true">
>
> 
>
> 
>
> 
>
> 
>
>
>
>
>
> And solr config;
>
>  regex="solr-dataimporthandler-\d.*\.jar" />
>
>regex="solr-dataimporthandler-extras-\d.*\.jar" />
>
>regex="cassandra-jdbc-driver-0.6.4.jar" />
>
> ……….
>
>
>
> 
>
> 
>
>   /var/solr/data/a/conf/dataconfig.xml
>
> 
>
>   
>
>
>
>
>
> Best regards.
>
>
>
> Thx for reply.
>
>
>
> [image: cid:74426A0B-010D-4871-A556-A3590DE88C60@islem.com.tr.]
>
> *Can Ezgi Aydemir*
>
> *Oracle Veri Tabanı Yöneticisi & Oracle Database Admin*
>
> *İşlem Coğrafi Bilgi Sistemleri Müh. & Eğitim AŞ.*
>
> 2024.Cadde No:14
> ,
> Beysukent 06800, Ankara, Türkiye
>
> *T : *0 312 233 50 00 .:. *F :* 0312 235 56 82
>
> *E : * *cayde...@islem.com.tr
> 
>  *
> .:. *W : **http://www.islem.com.tr
> *
>
>
>
>
> Bu e-posta ve ekindekiler gizli bilgiler içeriyor olabilir ve sadece
> adreslenen kişileri ilgilendirir. Eğer adreslenen kişi siz değilseniz, bu
> e-postayı yaymayınız, dağıtmayınız veya kopyalamayınız. Eğer bu e-posta
> yanlışlıkla size gönderildiyse, lütfen bu e-posta ve ek

Re: Solr 7* Sorry, no dataimport-handler defined

2017-11-07 Thread Shawn Heisey
On 11/7/2017 6:49 AM, richardg wrote:
> vs on the master that shows the error.
>
> 2017-11-07 13:29:14.131 INFO  (qtp1839206329-36) [  
> x:solr_aggregate_production] o.a.s.c.S.Request [solr_aggregate_production] 
> webapp=/solr path=/admin/mbeans
> params={cat=QUERYHANDLER&wt=json&_=1510061366718} status=0 QTime=2

The string "QUERYHANDLER" (all uppercase) only shows up in a 7.1.0
source code checkout in the reference guide, it is not in any code that
builds the program.  Its presence in the reference guide is likely a
documentation error.

If you are seeing QUERYHANDLER in a log for version 7.1.0, then I have
to wonder exactly how you did the upgrade -- because I think there are
only two ways that could happen:  1) Your 7.1.0 install includes at
least some files from a version before 6.4.1.  2) You've got something
(perhaps a load balancer) mixing up requests between two different
versions of Solr.

Thanks,
Shawn



Re: Incorrect ngroup count

2017-11-07 Thread Amrit Sarkar
Zheng,

Usually, the number of records returned is more than what is shown in the
> ngroup. For example, I may get a ngroup of 22, but there are 25 records
> being returned.


Does the 25 records being returned have duplicates? Grouping is subjected
to co-location of data of same group values in same shard. Can you share
what is the architecture of the setup?


Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2
Medium: https://medium.com/@sarkaramrit2

On Tue, Nov 7, 2017 at 8:36 AM, Zheng Lin Edwin Yeo 
wrote:

> Hi,
>
> I'm using Solr 6.5.1, and I'm facing the issue of incorrect ngroup count
> after I have group it by signature field.
>
> Usually, the number of records returned is more than what is shown in the
> ngroup. For example, I may get a ngroup of 22, but there are 25 records
> being returned.
>
> Below is the part of solrconfig.xml that does the grouping.
>
>   "solr.processor.SignatureUpdateProcessorFactory">  name="enabled">true
>  signature  "overwriteDupes">false content  "signatureClass">solr.processor.Lookup3Signature  <
> processor class="solr.DistributedUpdateProcessorFactory" />  class
> ="solr.LogUpdateProcessorFactory" />  "solr.RunUpdateProcessorFactory" /> 
>
>
> This is where I set the grouping to true in the requestHandler
>
> true signature  name="group.main">true  <
> str name="group.cache.percent">100
>
> What could be the issue that causes this?
>
> Regards,
> Edwin
>


Re: Solr 7* Sorry, no dataimport-handler defined

2017-11-07 Thread richardg
Yes I am referring to the dataimport tab in the admin UI and issue
SOLR-10035.  My previous setup w/ 6.3 did not show this error.  I then
upgraded to 7.1.0 and the error shows.  I upgraded(downgraded) to versions
6.5.0 and 6.6.2 and I do not see the error.  Version 7.0.1 also shows the
error for me.  I am currently using version 6.6.2 and have been successfully
able to run a data import from the admin UI. 

In my config directory we have 

solcore.properties
solrconfig.xml which defines the dataimport handler (data-config.xml)
schema.xml
dataimport.properties
data-config.xml
some admin-extra*.html files

We copy all the config files over to the slave instances and they do no show
this behavior on 7.1.0, dataimport tab loads fine.  The only thing I notice
is on the slaves I see entries like this in the log:

2017-11-07 13:36:11.200 INFO  (qtp2053591126-35) [  
x:solr_aggregate_production] o.a.s.c.S.Request [solr_aggregate_production] 
webapp=/solr path=/admin/mbeans params={cat=QUERY&wt=json&_=1510061783971}
status=0 QTime=0

vs on the master that shows the error.

2017-11-07 13:29:14.131 INFO  (qtp1839206329-36) [  
x:solr_aggregate_production] o.a.s.c.S.Request [solr_aggregate_production] 
webapp=/solr path=/admin/mbeans
params={cat=QUERYHANDLER&wt=json&_=1510061366718} status=0 QTime=2

I see just "QUERY" in the slave that is working and "QUERYHANDLER" in the
master that isn't.  This is why I referenced the issue w/ 6.4 (SOLR-10035). 
Other than that I do not see anything in the log showing and error for the
dataimport handler.

Thanks



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Solr snakeyaml Error Problem

2017-11-07 Thread Can Ezgi Aydemir
Hi everybody,

I am trying Cassandra solr integration. I configured solr files;  
dataconfig.xml, solrconfig.xml and managed-schema. But solr does not connect 
Cassandra and snakeyaml error which  is;

Exception in thread "Thread-18" java.lang.NoClassDefFoundError: 
org/yaml/snakeyaml/Yaml
at 
com.github.cassandra.jdbc.CassandraConfiguration.(CassandraConfiguration.java:167)
at 
com.github.cassandra.jdbc.CassandraDriver.acceptsURL(CassandraDriver.java:103)
at 
com.github.cassandra.jdbc.CassandraDriver.connect(CassandraDriver.java:107)
at java.sql.DriverManager.getConnection(DriverManager.java:664)
at java.sql.DriverManager.getConnection(DriverManager.java:208)
   at 
org.apache.solr.handler.dataimport.JdbcDataSource$1.call(JdbcDataSource.java:185)
at 
org.apache.solr.handler.dataimport.JdbcDataSource$1.call(JdbcDataSource.java:172)
at 
org.apache.solr.handler.dataimport.JdbcDataSource.getConnection(JdbcDataSource.java:528)
at 
org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.(JdbcDataSource.java:317)
at 
org.apache.solr.handler.dataimport.JdbcDataSource.createResultSetIterator(JdbcDataSource.java:288)
at 
org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:283)
at 
org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:52)
at 
org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntityProcessor.java:59)
at 
org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:73)
at 
org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:267)
at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:476)
at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:415)
at 
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:330)
at 
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:233)
at 
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:415)
at 
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:474)
at 
org.apache.solr.handler.dataimport.DataImporter.lambda$runAsync$0(DataImporter.java:457)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.ClassNotFoundException: org.yaml.snakeyaml.Yaml
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at 
org.eclipse.jetty.webapp.WebAppClassLoader.loadClass(WebAppClassLoader.java:487)
at 
org.eclipse.jetty.webapp.WebAppClassLoader.loadClass(WebAppClassLoader.java:428)


Dataconfig file;











And solr config;

  
  
……….



  /var/solr/data/a/conf/dataconfig.xml

  


Best regards.

Thx for reply.

[cid:74426A0B-010D-4871-A556-A3590DE88C60@islem.com.tr.]

Can Ezgi Aydemir
Oracle Veri Tabanı Yöneticisi & Oracle Database Admin

İşlem Coğrafi Bilgi Sistemleri Müh. & Eğitim AŞ.
2024.Cadde No:14, Beysukent 06800, Ankara, Türkiye
T : 0 312 233 50 00 .:. F : 0312 235 56 82
E :  
cayde...@islem.com.tr
 .:. W : 
http://www.islem.com.tr



Bu e-posta ve ekindekiler gizli bilgiler içeriyor olabilir ve sadece adreslenen 
kişileri ilgilendirir. Eğer adreslenen kişi siz değilseniz, bu e-postayı 
yaymayınız, dağıtmayınız veya kopyalamayınız. Eğer bu e-posta yanlışlıkla size 
gönderildiyse, lütfen bu e-posta ve ekindeki dosyaları sisteminizden siliniz ve 
göndereni hemen bilgilendiriniz. Ayrıca, bu e-posta ve ekindeki dosyaları virüs 
bulaşması ihtimaline karşı taratınız. İŞLEM GIS® bu e-posta ile taşınabilecek 
herhangi bir virüsün neden olabileceği hasarın sorumluluğunu kabul etmez. Bilgi 
için:b...@islem.com.tr This message may contain confidential information and is 
intended only for recipient name. If you are not the named addressee you should 
not disseminate, distribute or copy this e-mail. Please notify the sender 
immediately if you have received this e-mail by mistake and delete this e-mail 
from your system. Finally, the recipient should check this email and any 
attachments for the presence of viruses. İŞLEM GIS® accepts no liability for 
any damage may be caused by any virus transmitted by this email.” For 
information: b...@islem.com.tr


Re: recent utf8 problems

2017-11-07 Thread Rick Leir
Dr Krell
Item 11): It is best to get the solrconfig.xml provided with the new version of 
Solr, and change it to suit your needs. Do not try to work from the old 
version's solrconfig.xml.

I did not have time to read the other items. 

Look in solr.log, and compare the successful query with the unsuccessful one 
for clues, then look at the config for /select again.
Cheers -- Rick

On November 7, 2017 12:43:00 AM EST, "Dr. Mario Michael Krell" 
 wrote:
>Hi,
>
>thank you for your time and trying to narrow down my problem.
>
>1) When looking for Tübingen in the title, I am expecting the 3092484
>results. That sounds like a reasonable result. Furthermore, when
>looking at some of the results, they are exactly what I am looking for.
>
>2) I am testing them against the same solr server. This is a very
>simple testing setup, that brings our problem to the core. Originally,
>we used a urlib.request.urlopen query to get the data in Python and
>then send it to our webpage (http://search.mmcommons.org/) as a json
>object. I think, I should explain my test more clearly. We use a
>webbrowser (Firefox or Chrome) to open the admin console of the search
>engine, which is at http://localhost:8983/solr/#/mmc_search3/query
> on my local device.
>This is the default behavior. In this webbrowser, I use the query 
>"title:T%C3%BCbingen” in the field “g” with /select as the
>“Request-Handler (qt) <>”.This approach works like a charm (result wich
>echoParams attached). Also as asked by Rick, the request url displayed
>in the upper left is just perfect:
>http://localhost:8983/solr/mmc_search3/select?echoParams=all&q=title:T%C3%BCbingen&wt=python
>
>The problems start to occur, when I click on this url:
>{
>  'responseHeader':{
>'status':0,
>'QTime':0,
>'params':{
>  'q':u'title:T\u00fcbingen',
>  'echoParams':'all',
>  'wt':'python'}},
>  'response':{'numFound':0,'start':0,'docs':[]
>  }}
>So it seems internally, Solr is changing the request (or a used
>library?). I just don’t have any idea why. But I would like to get the
>more than 3 million results. I could as well just enter the above url
>into my browser and the url will be changed to
>http://localhost:8983/solr/mmc_search3/select?echoParams=all&q=title:Tübingen&wt=python
>
>and I get the same result (no found documents). So this is the problem.
>However, when I copy paste the url, it is still displaying the utf8
>encoding. I thing the “ü” in the url is just an improved layout by the
>browser.
>
>The confusion with the different solr comes from the fact, that I am
>continuously trying to improve my search index and make it more
>efficient. Hence I reindexed it several times, always to the latest
>version. The last reindexing occurred for Solr 7.0.1. having the
>indexing for Lucene 7.0.1. However, I performed the test also for other
>versions without any success.
>
>3) As Rick said: "With the Yahoo Flickr Creative Commons 100 Million
>(YFCC100m) dataset, a great novel dataset was introduced to the
>computer vision and multimedia research community." — cool
>
>My objective it to make it better usable, especially by providing
>different search modalities. The dataset consists of 99 Million images
>and 800k videos, but I am only working on the Flickr as well as
>generated metadata and try to add more and more metadata. The next big
>challenge is similarity search.
>
>4)
>http://localhost:8983/solr/mmc_search3/select?echoParams=all&q=title:Tübingen&wt=python
>
>is displayed but it is
>http://localhost:8983/solr/mmc_search3/select?echoParams=all&q=title:T%C3%BCbingen&wt=python
>.
>
>5) I am searching for Tübingen. It is u-umlaut (LATIN SMALL LETTER U
>WITH DIAERESIS) as Rick said.
>
>6) I am just clicking on it in the admin solr standard interface. I
>could as well copy it into my webbrowser and open it. The result would
>be the same.
> 
>
>7) As you can see in the result, the document seems to be indexed
>correctly, isn’t it? If we can’t figure anything out, I will try to
>reindex again but this will take a while because of the large amount of
>data and my limited compute power.
>
>8) Thanks for the hint with echoparams. The result is displayed above.
>
>9) As shown in the attached search result, there are actually results
>correctly indexed.
>
>10) The above example is now with Python.
>
>11) @Rick: Shall I change the /select handler? I do not quite
>understand the problem with it. But maybe as an explanation, my
>original config was probably based on solr4.x. I basically just updated
>the Lucene version and 

Re: Java 9

2017-11-07 Thread Daniel Collins
Oh, blimey, have Oracle gone with Ubuntu-style numbering now? :)

On 7 November 2017 at 08:27, Markus Jelsma 
wrote:

> Shawn,
>
> There won't be a Java 10, we'll get Java 18.3 instead. After 9 it is a
> guess when CMS and friends are gone.
>
> Regards,
> Markus
>
>
>
> -Original message-
> > From:Shawn Heisey 
> > Sent: Tuesday 7th November 2017 0:24
> > To: solr-user@lucene.apache.org
> > Subject: Re: Java 9
> >
> > On 11/6/2017 3:07 PM, Petersen, Robert (Contr) wrote:
> > > Anyone else been noticing this this msg when starting up solr with
> java 9? (This is just an FYI and not a real question)
> > >
> > > Java HotSpot(TM) 64-Bit Server VM warning: Option UseConcMarkSweepGC
> was deprecated in version 9.0 and will likely be removed in a future
> release.
> > > Java HotSpot(TM) 64-Bit Server VM warning: Option UseParNewGC was
> deprecated in version 9.0 and will likely be removed in a future release.
> >
> > I have not tried Java 9 yet.
> >
> > Looks like G1 is now the default garbage collector.  I did not know that
> > they were deprecating CMS and ParNew ... that's a little surprising.
> > Solr's default garbage collection tuning uses those two collectors.  It
> > is likely that those choices will be available in all versions of Java
> > 9.  It would be very uncharacteristic for Oracle to take action on
> > removing them until version 10, possibly later.
> >
> > If it were solely up to me, I would adjust Solr's startup script to use
> > the G1 collector by default, eliminating the warnings you're seeing.
> > It's not just up to me though.  Lucene documentation says to NEVER use
> > the G1 collector because they believe it to be unpredictable and have
> > the potential to cause problems.  I personally have never had any issues
> > with it.  There is *one* Lucene issue mentioning problems with G1GC, and
> > that issue is *specific* to the 32-bit JVM, which is not recommended
> > because of the limited amount of memory it can use.
> >
> > My experiments with GC tuning show the G1 collector (now default in Java
> > 9) to have very good characteristics with Solr.  I have a personal page
> > on the Solr wiki that covers those experiments.
> >
> > https://wiki.apache.org/solr/ShawnHeisey
> >
> > Thanks,
> > Shawn
> >
> >
>


RE: Java 9

2017-11-07 Thread Markus Jelsma
Shawn,

There won't be a Java 10, we'll get Java 18.3 instead. After 9 it is a guess 
when CMS and friends are gone.

Regards,
Markus

 
 
-Original message-
> From:Shawn Heisey 
> Sent: Tuesday 7th November 2017 0:24
> To: solr-user@lucene.apache.org
> Subject: Re: Java 9
> 
> On 11/6/2017 3:07 PM, Petersen, Robert (Contr) wrote:
> > Anyone else been noticing this this msg when starting up solr with java 9? 
> > (This is just an FYI and not a real question)
> >
> > Java HotSpot(TM) 64-Bit Server VM warning: Option UseConcMarkSweepGC was 
> > deprecated in version 9.0 and will likely be removed in a future release.
> > Java HotSpot(TM) 64-Bit Server VM warning: Option UseParNewGC was 
> > deprecated in version 9.0 and will likely be removed in a future release.
> 
> I have not tried Java 9 yet.
> 
> Looks like G1 is now the default garbage collector.  I did not know that
> they were deprecating CMS and ParNew ... that's a little surprising. 
> Solr's default garbage collection tuning uses those two collectors.  It
> is likely that those choices will be available in all versions of Java
> 9.  It would be very uncharacteristic for Oracle to take action on
> removing them until version 10, possibly later.
> 
> If it were solely up to me, I would adjust Solr's startup script to use
> the G1 collector by default, eliminating the warnings you're seeing. 
> It's not just up to me though.  Lucene documentation says to NEVER use
> the G1 collector because they believe it to be unpredictable and have
> the potential to cause problems.  I personally have never had any issues
> with it.  There is *one* Lucene issue mentioning problems with G1GC, and
> that issue is *specific* to the 32-bit JVM, which is not recommended
> because of the limited amount of memory it can use.
> 
> My experiments with GC tuning show the G1 collector (now default in Java
> 9) to have very good characteristics with Solr.  I have a personal page
> on the Solr wiki that covers those experiments.
> 
> https://wiki.apache.org/solr/ShawnHeisey
> 
> Thanks,
> Shawn
> 
> 


Re: Faceting Word Count

2017-11-07 Thread Wael Kader
Hi,

The whole index has 100M but when I add the criteria, it will filter the
data to maybe 10k as a max number of rows.
The facet isn't working when the total number of records in the index is
100M but it was working at 5M.

I have social media & RSS data in the index and I am trying to get the word
count for a specific user on specific date intervals.

Regards,
Wael

On Mon, Nov 6, 2017 at 3:42 PM, Erick Erickson 
wrote:

> _Why_ do you want to get the word counts? Faceting on all of the
> tokens for 100M docs isn't something Solr is ordinarily used for. As
> Emir says it'll take a huge amount of memory. You can use one of the
> function queries (termfreq IIRC) that will give you the count of any
> individual term you have and will be very fast.
>
> But getting all of the word counts in the index is probably not
> something I'd use Solr for.
>
> This may be an XY problem, you're asking how to do something specific
> (X) without explaining what the problem you're trying to solve is (Y).
> Perhaps there's another way to accomplish (Y) if we knew more about
> what it is.
>
> Best,
> Erick
>
>
>
> On Mon, Nov 6, 2017 at 4:15 AM, Emir Arnautović
>  wrote:
> > Hi Wael,
> > You are faceting on analyzed field. This results in field being
> uninverted - fieldValueCache being built - on first call after every
> commit. This is both time and memory consuming (you can check in admin
> console in stats how much memory it took).
> > What you need to do is to create multivalue string field (not text) and
> parse values (do analysis steps) on client side and store it like that.
> This will allow you to enable docValues on that field and avoid building
> fieldValueCache.
> >
> > HTH,
> > Emir
> > --
> > Monitoring - Log Management - Alerting - Anomaly Detection
> > Solr & Elasticsearch Consulting Support Training - http://sematext.com/
> >
> >
> >
> >> On 6 Nov 2017, at 13:06, Wael Kader  wrote:
> >>
> >> Hi,
> >>
> >> I am using a custom field. Below is the field definition.
> >> I am using this because I don't want stemming.
> >>
> >>
> >> >> positionIncrementGap="100">
> >>  
> >> >> mapping="mapping-ISOLatin1Accent.txt"/>
> >>
> >>
> >> >>ignoreCase="true"
> >>words="stopwords.txt"
> >>enablePositionIncrements="true"
> >>/>
> >> >>protected="protwords.txt"
> >>generateWordParts="0"
> >>generateNumberParts="1"
> >>catenateWords="1"
> >>catenateNumbers="1"
> >>catenateAll="0"
> >>splitOnCaseChange="1"
> >>preserveOriginal="1"/>
> >>
> >>
> >>
> >>  
> >>  
> >> >> mapping="mapping-ISOLatin1Accent.txt"/>
> >>
> >> synonyms="synonyms.txt"
> >> ignoreCase="true" expand="true"/>
> >> >>ignoreCase="true"
> >>words="stopwords.txt"
> >>enablePositionIncrements="true"
> >>/>
> >> 
> >> >>protected="protwords.txt"
> >>generateWordParts="0"
> >>catenateWords="0"
> >>catenateNumbers="0"
> >>catenateAll="0"
> >>splitOnCaseChange="1"
> >>preserveOriginal="1"/>
> >>
> >>
> >>
> >>
> >>  
> >>
> >>
> >>
> >> Regards,
> >> Wael
> >>
> >> On Mon, Nov 6, 2017 at 10:29 AM, Emir Arnautović <
> >> emir.arnauto...@sematext.com> wrote:
> >>
> >>> Hi Wael,
> >>> Can you provide your field definition and sample query.
> >>>
> >>> Thanks,
> >>> Emir
> >>> --
> >>> Monitoring - Log Management - Alerting - Anomaly Detection
> >>> Solr & Elasticsearch Consulting Support Training -
> http://sematext.com/
> >>>
> >>>
> >>>
>  On 6 Nov 2017, at 08:30, Wael Kader  wrote:
> 
>  Hello,
> 
>  I am having an index with around 100 Million documents.
>  I have a multivalued column that I am saving big chunks of text data
> in.
> >>> It
>  has around 20 GB of RAM and 4 CPU's.
> 
>  I was doing faceting on it to get word cloud but it was taking around
> 1
>  second to retrieve when the data was 5-10 Million .
>  Now I have more data and its taking minutes to get the results (that
> is
> >>> if
>  it gets it and SOLR doesn't crash). Whats the best way to make it run
> or
>  maybe its not scalable to make it run on my current schema and design
> >>> with
>  News articles.
> 
>  I am looking to find the best solution for this. Maybe create another
> >>> index
>  to split the data while inserting it or maybe if I change some
> settings
> >>> in
>  SolrConfig or add some RAM, it would perform better.
> 
>  --
>  Regards,
>  Wael
> >>>
> >>>
> >>
> >>
> >> --
> >> Regards,
> >> Wael
> >
>



-- 
Regards,
Wael


Re: Anyone have any comments on current solr monitoring favorites?

2017-11-07 Thread Daniel Ortega
Hi @Atita,

We are using the latear version (Solr 7.1.0).

As the metrics are exposed with MBeans via JMX, you could use the
Prometheus JMX exportar to take the values of that metrics and expose them.
You could use it to monitor caches, response times, number of errors in all
the handlers you have defined.

To configure JMX in a Solr instance follow this link:
https://lucene.apache.org/solr/guide/6_6/using-jmx-with-solr.html

This page explains some of the JMX metrics that Solr exposes:
https://lucene.apache.org/solr/guide/6_6/performance-statistics-reference.html

Basically the JMX exporter is an Embedded Jetty server that read values
exposed using JMX (in localhost or in a remote instance), parse that values
and exposes them using the format that Prometheus could scrap.

Best regards,
Daniel

El El mar, 7 nov 2017 a las 2:43, Atita Arora 
escribió:

> Hi @Daniel ,
>
> What version of Solr are you using ?
> We gave prometheus + Jolokia + InfluxDB + Grafana a try , that came out
> well.
> With Solr 6.6 the metrics are explosed through the /metrics api, but how do
> we go about for the earlier versions , please guide ?
> Specifically the cache monitoring.
>
> Thanks in advance,
> Atita
>
> On Mon, Nov 6, 2017 at 2:19 PM, Daniel Ortega  >
> wrote:
>
> > Hi Robert,
> >
> > We use the following stack:
> >
> > - Prometheus to scrape metrics (https://prometheus.io/)
> > - Prometheus node exporter to export "machine metrics" (Disk, network
> > usage, etc.) (https://github.com/prometheus/node_exporter)
> > - Prometheus JMX exporter to export "Solr metrics" (Cache usage, QPS,
> > Response times...) (https://github.com/prometheus/jmx_exporter)
> > - Grafana to visualize all the data scrapped by Prometheus (
> > https://grafana.com/)
> >
> > Best regards
> > Daniel Ortega
> >
> > 2017-11-06 20:13 GMT+01:00 Petersen, Robert (Contr) <
> > robert.peters...@ftr.com>:
> >
> > > PS I knew sematext would be required to chime in here!  😊
> > >
> > >
> > > Is there a non-expiring dev version I could experiment with? I think I
> > did
> > > sign up for a trial years ago from a different company... I was
> actually
> > > wondering about hooking it up to my personal AWS based solr cloud
> > instance.
> > >
> > >
> > > Thanks
> > >
> > > Robi
> > >
> > > 
> > > From: Emir Arnautović 
> > > Sent: Thursday, November 2, 2017 2:05:10 PM
> > > To: solr-user@lucene.apache.org
> > > Subject: Re: Anyone have any comments on current solr monitoring
> > favorites?
> > >
> > > Hi Robi,
> > > Did you try Sematext’s SPM? It provides host, JVM and Solr metrics and
> > > more. We use it for monitoring our Solr instances and for consulting.
> > >
> > > Disclaimer - see signature :)
> > >
> > > Emir
> > > --
> > > Monitoring - Log Management - Alerting - Anomaly Detection
> > > Solr & Elasticsearch Consulting Support Training -
> http://sematext.com/
> > >
> > >
> > >
> > > > On 2 Nov 2017, at 19:35, Walter Underwood 
> > wrote:
> > > >
> > > > We use New Relic for JVM, CPU, and disk monitoring.
> > > >
> > > > I tried the built-in metrics support in 6.4, but it just didn’t do
> what
> > > we want. We want rates and percentiles for each request handler. That
> > gives
> > > us 95th percentile for textbooks suggest or for homework search results
> > > page, etc. The Solr metrics didn’t do that. The Jetty metrics didn’t do
> > > that.
> > > >
> > > > We built a dedicated servlet filter that goes in front of the Solr
> > > webapp and reports metrics. It has some special hacks to handle some
> > weird
> > > behavior in SolrJ. A request to the “/srp” handler is sent as
> > > “/select?qt=/srp”, so we normalize that.
> > > >
> > > > The metrics start with the cluster name, the hostname, and the
> > > collection. The rest is generated like this:
> > > >
> > > > URL: GET /solr/textbooks/select?q=foo&qt=/auto
> > > > Metric: textbooks.GET./auto
> > > >
> > > > URL: GET /solr/textbooks/select?q=foo
> > > > Metric: textbooks.GET./select
> > > >
> > > > URL: GET /solr/questions/auto
> > > > Metric: questions.GET./auto
> > > >
> > > > So a full metric for the cluster “solr-cloud” and the host “search01"
> > > would look like “solr-cloud.search01.solr.textbooks.GET./auto.m1_rate”.
> > > >
> > > > We send all that to InfluxDB. We’ve configured a template so that
> each
> > > part of the metric name is mapped to a field, so we can write efficient
> > > queries in InfluxQL.
> > > >
> > > > Metrics are graphed in Grafana. We have dashboards that mix
> Cloudwatch
> > > (for the load balancer) and InfluxDB.
> > > >
> > > > I’m still working out the kinks in some of the more complicated
> > queries,
> > > but the data is all there. I also want to expand the servlet filter to
> > > report HTTP response codes.
> > > >
> > > > wunder
> > > > Walter Underwood
> > > > wun...@wunderwood.org
> > > > http://observer.wunderwood.org/  (my blog)
> > > >
> > > >
> > > >> On Nov 2, 2017, at 9:30 AM, Petersen, Robert (Contr) <
> > > robert.peters...@