Re: Windows post.jar Can't unambiguously select between fixed arity signatures

2016-01-26 Thread Erik Hatcher
I haven't seen this error before but there is some crazy JavaScript in the 
example/files update processing.  If you're indexing CSV where each row is a 
separate document, example/files may not be the config you want to start with 
anyway.  Try creating your collection without that -d. 

Oh, I see the issue - example/files script looks for a "content" field (to 
extract email addresses and URLs from) and there isn't one (see "null" below). 
Again, example/files isn't designed for non-"content" documents.  I'll make a 
note to fix this issue (by ignoring the extraction part when no content) 
though. 

   Erik

> On Jan 26, 2016, at 04:40, Netz, Steffen  
> wrote:
> 
> Hi,
> 
> I'm just downloaded solr and playing around.
> So far I started the Server and created a core:
> 
> bin\solr start
> bin\solr create -c files -d example\files\conf
> 
> now, I'm trying to post some files:
> java -Dauto  -Dc=files -jar example\exampledocs\post.jar 
> example\exampledocs\books.csv
> 
> and get the following error:
> org.apache.solr.common.SolrException: Unable to invoke function processAdd in 
> script: update-script.js: Can't unambiguously select between fixed arity 
> signatures [(java.lang.String, java.lang.String), (java.lang.String, 
> java.io.Reader)] of the method 
> org.apache.solr.analysis.TokenizerChain.tokenStream for argument types 
> [java.lang.String, null]
>at 
> org.apache.solr.update.processor.StatelessScriptUpdateProcessorFactory$ScriptUpdateProcessor.invokeFunction(StatelessScriptUpdateProcessorFactory.java:433)
> ...
> Caused by: java.lang.NoSuchMethodException: Can't unambiguously select 
> between fixed arity signatures [(java.lang.String, java.lang.String), 
> (java.lang.String, java.io.Reader)] of the method 
> org.apache.solr.analysis.TokenizerChain.tokenStream for argument types 
> [java.lang.String, null]
>at 
> jdk.internal.dynalink.beans.OverloadedMethod.throwAmbiguousMethod(OverloadedMethod.java:225)
> 
> 
> Any hint?
> Steffen
> 
> My Solr: 5.4.1
> My Java : java version "1.8.0_71"
> 
> 


Re: Query results change

2016-01-26 Thread Toke Eskildsen
On Mon, 2016-01-25 at 20:38 -0700, Shawn Heisey wrote:
> Very likely what's happening is that sometimes your shards are
> responding on a different timescale with each request, so the pieces
> that get combined into the final result set arrive in a different
> order.  This causes the Java object containing the results to get
> populated in a different order.

But is should not. Deterministic sort order is essential for paging.

Standard score-based sorting uses the shard-ID as tie breaker. If I am
not mistaken, that happens in the MergeSortQueue in the TopDocs?

- Toke Eskildsen, State and University Library, Denmark




Windows post.jar Can't unambiguously select between fixed arity signatures

2016-01-26 Thread Netz, Steffen
Hi,

I'm just downloaded solr and playing around.
So far I started the Server and created a core:

bin\solr start
bin\solr create -c files -d example\files\conf

now, I'm trying to post some files:
java -Dauto  -Dc=files -jar example\exampledocs\post.jar 
example\exampledocs\books.csv

and get the following error:
org.apache.solr.common.SolrException: Unable to invoke function processAdd in 
script: update-script.js: Can't unambiguously select between fixed arity 
signatures [(java.lang.String, java.lang.String), (java.lang.String, 
java.io.Reader)] of the method 
org.apache.solr.analysis.TokenizerChain.tokenStream for argument types 
[java.lang.String, null]
at 
org.apache.solr.update.processor.StatelessScriptUpdateProcessorFactory$ScriptUpdateProcessor.invokeFunction(StatelessScriptUpdateProcessorFactory.java:433)
...
Caused by: java.lang.NoSuchMethodException: Can't unambiguously select between 
fixed arity signatures [(java.lang.String, java.lang.String), 
(java.lang.String, java.io.Reader)] of the method 
org.apache.solr.analysis.TokenizerChain.tokenStream for argument types 
[java.lang.String, null]
at 
jdk.internal.dynalink.beans.OverloadedMethod.throwAmbiguousMethod(OverloadedMethod.java:225)


Any hint?
Steffen

My Solr: 5.4.1
My Java : java version "1.8.0_71"




Re: Accessing Index Modification Information

2016-01-26 Thread Björn Keil

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256
 


Am 25.01.2016 um 16:25 schrieb Shawn Heisey:
> On 1/25/2016 7:05 AM, Björn Keil wrote:
>> I am using Solr 5.1 (within a Tomcat6 server) and am trying to find
out how get information from a Solr server about the exact time of the
last commit and the total number of documents in a given index, and I
need to query this information in a scripted way.
>>
>> The total number of documents in a given index is the minor problem,
of course, since I can always send a query to the solr server with q=*:*
and rows=0. But how I get the exact time which is displayed as "last
modification time" in the user interface in the overview for the given
index?
>
> Send a request to /solr/corename/admin/mbeans ... depending on what
> specific info you are after, you may or may not need a URL parameter of
> stats=true.
>
> http://host:port/solr/corename/admin/mbeans?stats=true
>
> This includes all the info you asked for.
>
> FYI, you really should not run Solr 5.x in Tomcat.  It comes with a
> container (Jetty) included, and has start scripts that use this
> container.  There is even an install script that works on most
> non-Microsoft operating systems.  Running Solr in the provided way will
> yield a system that has been properly tuned for best results.  A default
> install of Tomcat is not tuned for Solr.
>
> Thanks,
> Shawn
>
... that helps a lot. Since I am processing the result with PHP I can
even attach a =phps to the request, which makes it a lot easier for
me to process,.
-BEGIN PGP SIGNATURE-
Version: GnuPG v2
 
iQIcBAEBCAAGBQJWp0a6AAoJEBJu2vWalbUMif8QALKeWMDNGZZPrs4QENOc0eE+
qw5jaljVjAwULvoqMqgk8PZRhC7U9RBt6tjYUZE4L9UTCgbqPbo2SLIGcbzkAkbM
ILFK/vdo8ENRyqStFMOFKVtWgUouuYjdm5karPiYOSQF9Z+icxT99o+kMBlM2T2A
xDFBjDLVJ42bukPdBTStvfdAq5/3kg5SO/hTs/uX+swI0v6ny7CWxwSz1EMJDODo
JXJqHvVKqvExDN+P/sM3kpZD5BwPX7+zoxRVihZso1jEU78eM4hxEnZI+yUSAFhO
vA51pV2nFu1quZrB6slwDkrFANnVS9eF66n+ZpBYzOE0hvsxMgjsJgGOxsyVhBse
7TfQ9liLDdOWRzgaWmja1ZenQIjA07aA78hRDJKHNF0hUZloIaWsm+/KteptfwXd
hxZ6sAHzeOuJwCohi88M2d4IIPWjIF0YoFnw6NA4NXTt/sns3nE0oT2T4L6nEjzg
Q6VVsm9guNFY2QqkYGCQJN1QIJnYFTkhJ8DK8DIIOcan3V0Tbirmw69zzLWmoE3N
yH5IUqoS7J3Gcq7NPABpFp5kmrtORRj+rc11NZo+BP9oQQcGmc3GTuTDA9ddDjfp
Ej1LSSn/9EmCVXwti2sbMyZKfKQ7Lqr5gmzXhVu1Zud38a/7qG8/g79+WQ9WAe6G
nFwn0VAretomtfkEzzb6
=+2L6
-END PGP SIGNATURE-



Re: Solr QTime explanation

2016-01-26 Thread Damien Picard
Thank you, you are right ! It seems to be a congestion from our test tool.

Regards,

2016-01-19 18:46 GMT+01:00 Toke Eskildsen :

> Damien Picard  wrote:
> > Currently we have 4 Solr nodes, with 12Gb memory (heap) ; the collections
> > are replicated (4 shards, 1 replica).
> > This query mostly returns a QTime=4 and it takes around 20ms on the
> client
> > side to get the result.
>
> > We have to handle around 200 simultaneous connections.
>
> You are probably experiencing congestion. JMeter can visualize throughput.
> Try experimenting with 10 to 100 concurrent threads in increments of 10
> threads and look at throughput underway. My guess is that throughput will
> rise as you increase threads, until some point after which it will fall
> again as the Solrs exceeds their peak performance point. You might end up
> getting better performance by rate-limiting outside of SolrCloud.
>
> Also, what does 200 simultaneous connections mean? Is that 200 requests
> per second?
>
> - Toke Eskildsen
>



-- 
Damien Picard
Expert GWT

Mob : 06 11 51 47 78


Re: unmerged index segments

2016-01-26 Thread James Mason
Hi Jack,

Sorry, I should have put them on my original message.

All merge policy settings are at their default except mergeFactor, which I now 
notice is quite high at 45. Unfortunately I don’t have the full history to see 
when this setting was changed, but I do know they haven’t been changed for well 
over a year, and that we did originally run Solr using the default settings.

So reading about mergeFactor it sounds like this is likely the problem, and 
we’re simply not asking Solr to merge into these old and large segments yet?

If I was to change this back down to the default of 10, would you expect we’d 
get quite an immediate and intense period of merging? 

If I was to launch a dupliacate test Solr instance, change the merge factor, 
and simply leave it for a few days, would it perform the background merge (so I 
can test to see if there’s enough memory etc for the merge to complete?).

Thanks,

James



> On 25 Jan 2016, at 21:39, Jack Krupansky  wrote:
> 
> What exacting are you merge policy settings in solrconfig? They control
> when the background merges will be performed. Sometimes they do need to be
> tweaked.
> 
> -- Jack Krupansky
> 
> On Mon, Jan 25, 2016 at 1:50 PM, James Mason 
> wrote:
> 
>> Hi,
>> 
>> I’ve have a large index that has been added to over several years, and
>> I’ve discovered that I have many segments that haven’t been updated for
>> well over a year, even though I’m adding, updating and deleting records
>> daily. My five largest segments all haven’t been updated for over a year.
>> 
>> Meanwhile, the number of segments I have keeps on increasing, and I have
>> hundreds of segment files that don’t seem to be getting merged past a
>> certain size (e.g. the largest is 2Gb but my older segments are over 100Gb).
>> 
>> My understanding was that background merges should be merging these older
>> segments with newer data over time, but this doesn’t seem to be the case.
>> 
>> I’m using Solr 4.9, but I was using an older version at the time that
>> these ‘older’ segments were created.
>> 
>> Any help on suggestions of what’s happening would be very much
>> appreciated. And also any suggestion on how I can monitor what’s happening
>> with the background merges.
>> 
>> Thanks,
>> 
>> James



Re: Windows post.jar Can't unambiguously select between fixed arity signatures

2016-01-26 Thread Erik Hatcher
Steffan - I added a note to fix this in this JIRA ticket - 
https://issues.apache.org/jira/browse/SOLR-8590 


Some options for you, don’t use -d example/files (you can actually omit the 
“\conf” part of that parameter, nicely!) for books.csv.Only use 
example/files for true rich document “files” (such as PDF, Word, HTML, plain 
text, etc).  If for some reason example/files (yay!  there’s some good tricks 
in there, and I’m about to hit publish on a blog post on it just before the 
webinar about it tomorrow*!) adds some value for your needs, tinker with 
conf/update-script.js to avoid the use of the “content” field.  

Erik

* webinar: 
https://programs.lucidworks.com/Webinar-Solr-example-files.html?utm_source=hp 
 
and blog post series that will have the final in that series published soon 
too: https://lucidworks.com/blog/2015/12/08/browse-new-improved-solr-5/ 




> On Jan 26, 2016, at 4:40 AM, Netz, Steffen  
> wrote:
> 
> Hi,
> 
> I'm just downloaded solr and playing around.
> So far I started the Server and created a core:
> 
> bin\solr start
> bin\solr create -c files -d example\files\conf
> 
> now, I'm trying to post some files:
> java -Dauto  -Dc=files -jar example\exampledocs\post.jar 
> example\exampledocs\books.csv
> 
> and get the following error:
> org.apache.solr.common.SolrException: Unable to invoke function processAdd in 
> script: update-script.js: Can't unambiguously select between fixed arity 
> signatures [(java.lang.String, java.lang.String), (java.lang.String, 
> java.io.Reader)] of the method 
> org.apache.solr.analysis.TokenizerChain.tokenStream for argument types 
> [java.lang.String, null]
>at 
> org.apache.solr.update.processor.StatelessScriptUpdateProcessorFactory$ScriptUpdateProcessor.invokeFunction(StatelessScriptUpdateProcessorFactory.java:433)
> ...
> Caused by: java.lang.NoSuchMethodException: Can't unambiguously select 
> between fixed arity signatures [(java.lang.String, java.lang.String), 
> (java.lang.String, java.io.Reader)] of the method 
> org.apache.solr.analysis.TokenizerChain.tokenStream for argument types 
> [java.lang.String, null]
>at 
> jdk.internal.dynalink.beans.OverloadedMethod.throwAmbiguousMethod(OverloadedMethod.java:225)
> 
> 
> Any hint?
> Steffen
> 
> My Solr: 5.4.1
> My Java : java version "1.8.0_71"
> 
> 



AW: Windows post.jar Can't unambiguously select between fixed arity signatures

2016-01-26 Thread Netz, Steffen
Hi
thanks so much,
It works now, without -d switch!

2 related beginner questions:
Is there a good tutorial for filesystem search ( with howto 
config-files)?
Is there a search in the mailing list, I don't find it.

Wow, I just looked at the webinar! Great! I've to learn:)

regards
Steffen

-Ursprüngliche Nachricht-
Von: Erik Hatcher [mailto:erik.hatc...@gmail.com] 
Gesendet: Dienstag, 26. Januar 2016 16:15
An: solr-user@lucene.apache.org
Betreff: Re: Windows post.jar Can't unambiguously select between fixed arity 
signatures

Steffan - I added a note to fix this in this JIRA ticket - 
https://issues.apache.org/jira/browse/SOLR-8590 


Some options for you, don’t use -d example/files (you can actually omit the 
“\conf” part of that parameter, nicely!) for books.csv.Only use 
example/files for true rich document “files” (such as PDF, Word, HTML, plain 
text, etc).  If for some reason example/files (yay!  there’s some good tricks 
in there, and I’m about to hit publish on a blog post on it just before the 
webinar about it tomorrow*!) adds some value for your needs, tinker with 
conf/update-script.js to avoid the use of the “content” field.  

Erik

* webinar: 
https://programs.lucidworks.com/Webinar-Solr-example-files.html?utm_source=hp 
 
and blog post series that will have the final in that series published soon 
too: https://lucidworks.com/blog/2015/12/08/browse-new-improved-solr-5/ 




> On Jan 26, 2016, at 4:40 AM, Netz, Steffen  
> wrote:
> 
> Hi,
> 
> I'm just downloaded solr and playing around.
> So far I started the Server and created a core:
> 
> bin\solr start
> bin\solr create -c files -d example\files\conf
> 
> now, I'm trying to post some files:
> java -Dauto  -Dc=files -jar example\exampledocs\post.jar 
> example\exampledocs\books.csv
> 
> and get the following error:
> org.apache.solr.common.SolrException: Unable to invoke function processAdd in 
> script: update-script.js: Can't unambiguously select between fixed arity 
> signatures [(java.lang.String, java.lang.String), (java.lang.String, 
> java.io.Reader)] of the method 
> org.apache.solr.analysis.TokenizerChain.tokenStream for argument types 
> [java.lang.String, null]
>at 
> org.apache.solr.update.processor.StatelessScriptUpdateProcessorFactory$ScriptUpdateProcessor.invokeFunction(StatelessScriptUpdateProcessorFactory.java:433)
> ...
> Caused by: java.lang.NoSuchMethodException: Can't unambiguously select 
> between fixed arity signatures [(java.lang.String, java.lang.String), 
> (java.lang.String, java.io.Reader)] of the method 
> org.apache.solr.analysis.TokenizerChain.tokenStream for argument types 
> [java.lang.String, null]
>at 
> jdk.internal.dynalink.beans.OverloadedMethod.throwAmbiguousMethod(OverloadedMethod.java:225)
> 
> 
> Any hint?
> Steffen
> 
> My Solr: 5.4.1
> My Java : java version "1.8.0_71"
> 
> 



Re: unmerged index segments

2016-01-26 Thread Jack Krupansky
Sorry I don't have any specific guidance since the results are so
unpredictable. But a much lower mergeFactor should result in more frequent
merges, which should reduce segment count but may slow indexing down.

If you make the change and then add enough documents to exceed the segment
size limit (ramBufferSizeMB and maxBufferedDocs), then it should trigger
the merge, we hope.

You may also have to use your own explicit  in order to get
control over more of the parameters of TieredMergePolicy which is the
default. Solr is using  to set the maxMergeAtOnce and
segmentsPerTier options to be the same, but you may want change them to
differ.

Some doc to read:
https://cwiki.apache.org/confluence/display/solr/IndexConfig+in+SolrConfig
https://wiki.apache.org/solr/SolrPerformanceFactors

The official Solr doc doesn't detail all the merge policy settings,
pointing yoou to the Javadoc, which for Tiered is here:
http://lucene.apache.org/core/5_4_0/core/org/apache/lucene/index/TieredMergePolicy.html

I did doc all of these options (as of Solr 4.4) in my Solr 4.x Deep Dive
e-book and I don't think much of that has changed since then:
http://www.lulu.com/us/en/shop/jack-krupansky/solr-4x-deep-dive-early-access-release-7/ebook/product-21203548.html

-- Jack Krupansky

On Tue, Jan 26, 2016 at 3:37 AM, James Mason 
wrote:

> Hi Jack,
>
> Sorry, I should have put them on my original message.
>
> All merge policy settings are at their default except mergeFactor, which I
> now notice is quite high at 45. Unfortunately I don’t have the full history
> to see when this setting was changed, but I do know they haven’t been
> changed for well over a year, and that we did originally run Solr using the
> default settings.
>
> So reading about mergeFactor it sounds like this is likely the problem,
> and we’re simply not asking Solr to merge into these old and large segments
> yet?
>
> If I was to change this back down to the default of 10, would you expect
> we’d get quite an immediate and intense period of merging?
>
> If I was to launch a dupliacate test Solr instance, change the merge
> factor, and simply leave it for a few days, would it perform the background
> merge (so I can test to see if there’s enough memory etc for the merge to
> complete?).
>
> Thanks,
>
> James
>
>
>
> > On 25 Jan 2016, at 21:39, Jack Krupansky 
> wrote:
> >
> > What exacting are you merge policy settings in solrconfig? They control
> > when the background merges will be performed. Sometimes they do need to
> be
> > tweaked.
> >
> > -- Jack Krupansky
> >
> > On Mon, Jan 25, 2016 at 1:50 PM, James Mason 
> > wrote:
> >
> >> Hi,
> >>
> >> I’ve have a large index that has been added to over several years, and
> >> I’ve discovered that I have many segments that haven’t been updated for
> >> well over a year, even though I’m adding, updating and deleting records
> >> daily. My five largest segments all haven’t been updated for over a
> year.
> >>
> >> Meanwhile, the number of segments I have keeps on increasing, and I have
> >> hundreds of segment files that don’t seem to be getting merged past a
> >> certain size (e.g. the largest is 2Gb but my older segments are over
> 100Gb).
> >>
> >> My understanding was that background merges should be merging these
> older
> >> segments with newer data over time, but this doesn’t seem to be the
> case.
> >>
> >> I’m using Solr 4.9, but I was using an older version at the time that
> >> these ‘older’ segments were created.
> >>
> >> Any help on suggestions of what’s happening would be very much
> >> appreciated. And also any suggestion on how I can monitor what’s
> happening
> >> with the background merges.
> >>
> >> Thanks,
> >>
> >> James
>
>


migrating solr 4.2.1 to 5.X

2016-01-26 Thread Midas A
I want migrate from solr 4.2.1 to 5.X version hten my question is

- can i use same snapshot of 4.2.1 in 5.x.x

Actually Indexing will take long time in my case then it would be possible
to do
or we should not do this.


next similar question is

- can we replicate 4.2.1 master to slave 5.x.x solr


Determine if Merge is triggered in SOLR

2016-01-26 Thread abhi Abhishek
Hi All,
is there a way in SOLR to determine if a merge has been triggered in
SOLR? is there a API exposed to query this?

if its not available is there a way to do the same using lucene jar files
available in the SOLR libs?

Appreciate your help.

Best Regards,
Abhishek


Re: How to use DocValues with TextField

2016-01-26 Thread Harry Yoo
Hi, I actually needed this functionality for a long time and I made up an 
extended data type to work around. 

In my use case, I need a case-insensitive search for a relatively short string 
and at the same time, I need faceting on the original string. For example, 
“Human, Home sapiens’ is an original input, and I want it to be searched by 
human, Human, homo sapiens or Homo Sapiens. 

Here is my workaround,

public class TextDocValueField extends TextField {

  @Override
  public List createFields(SchemaField field, Object value, 
float boost) {
if (field.hasDocValues()) {
  List fields = new ArrayList<>();
  fields.add(createField(field, value, boost));
  final BytesRef bytes = new BytesRef(value.toString());
  if (field.multiValued()) {
fields.add(new SortedSetDocValuesField(field.getName(), bytes));
  } else {
fields.add(new SortedDocValuesField(field.getName(), bytes));
  }
  return fields;
} else {
//  return Collections.singletonList(createField(field, value, boost));
  return super.createFields(field, value, boost);
}
  }

  @Override
  public void checkSchemaField(final SchemaField field) {
// do nothing
  }

  @Override
  public boolean multiValuedFieldCache() {
return false;
  }
}

I wish this can be supported by solr so that I don’t have to maintain my own 
repo.



What do you think?

Regards,
Harry


> On Jan 5, 2016, at 10:51 PM, Alok Bhandari  
> wrote:
> 
> Thanks Markus.
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/How-to-use-DocValues-with-TextField-tp4248647p4248797.html
> Sent from the Solr - User mailing list archive at Nabble.com.



Re: indexing rich data with solr 5.3.1 integreting in Ubuntu server

2016-01-26 Thread kostali hassan
they are loaded because solr is indexing .doc and .docx (msword) and fail
for pdf files .

2016-01-26 12:49 GMT+00:00 Emir Arnautovic :

> Hi,
> I would first check if external libraries are present and loaded. How do
> you start Solr? Try explicitly setting solr.install.dir or set absolute
> path to libs and see in logs if they are loaded.
>
>  regex=".*\.jar" />
>
>
> Thanks,
> Emir
>
> On 25.01.2016 15:16, kostali hassan wrote:
>
>> 0down votefavorite
>> <
>> http://stackoverflow.com/questions/34962280/solr-indexing-pdf-attachments-not-working-in-ubuntu#
>> >
>>
>>
>> I have a problem with integrating solr in Ubuntu server.Before using solr
>> on ubuntu server i tested it on my mac it was working perfectly for DIH
>> request handler and update/extract. it indexed my PDF,Doc,Docx
>> documents.so
>> after installing solr on ubuntu server and using the same configuration
>> files and librairies. i've found out that solr doesn't index PDf documents
>> and none Error and any exceptions in solr log.But i can search over .Doc
>> and .Docx documents.
>>
>> here some parts of my solrconfig.xml contents :
>>
>> > regex=".*\.jar" />
>>> regex="solr-cell-\d.*\.jar" />
>>
>> >startup="lazy"
>>class="solr.extraction.ExtractingRequestHandler" >
>>  
>>true
>>ignored_
>>_text_
>>  
>>
>>
>> DIH config:
>>
>> > class="org.apache.solr.handler.dataimport.DataImportHandler">
>> 
>> tika.config.xml
>> 
>> 
>>
>> tika.config.xml
>>
>> 
>>  
>>  
>>  > dataSource="null" rootEntity="false"
>>  baseDir="D:\Lucene\document"
>> fileName=".*\.(DOC)|(PDF)|(pdf)|(doc)|(docx)|(ppt)"
>> onError="skip"
>>  recursive="true">
>>  
>>  
>>  
>>   
>> >  name="documentImport"
>> dataSource="files"
>>  processor="TikaEntityProcessor"
>>  url="${files.fileAbsolutePath}"
>>  format="text">
>>
>>
>>  
>> > name="title" meta="true"/>
>>  
>>
>> > name="content"/>
>>  > name="LastModifiedBy" meta="true"/>
>>  
>>  
>>  
>> 
>>
>>
> --
> Monitoring * Alerting * Anomaly Detection * Centralized Log Management
> Solr & Elasticsearch Support * http://sematext.com/
>
>


Re: How to use DocValues with TextField

2016-01-26 Thread Erick Erickson
DocValues was designed to support unanalyzed types
originally. I don't know that code, but given my respect
for the people who wrote I'd be amazed if there weren't
very good reasons this is true. I suspect your work-around
is going to be "surprising".

And have you tested your change at scale? I suspect
searching won't scale well.

bq:  I need a case-insensitive search for a relatively short string
and at the same time, I need faceting on the original string

There's no reason at all to change code to do this. Just use a copyField.
The field that's to be faceted on is a "string" type with docValues=true, and
the searchable field is some text type with the appropriate analysis chain.

This doesn't really make much difference memory wise since the indexing
and docValues are separate in the first place. I.e. if I specify
indexed=true and docValues=true I get _two_ sets of date indexed.

Best,
Erick

On Tue, Jan 26, 2016 at 8:50 AM, Harry Yoo  wrote:
> Hi, I actually needed this functionality for a long time and I made up an 
> extended data type to work around.
>
> In my use case, I need a case-insensitive search for a relatively short 
> string and at the same time, I need faceting on the original string. For 
> example, “Human, Home sapiens’ is an original input, and I want it to be 
> searched by human, Human, homo sapiens or Homo Sapiens.
>
> Here is my workaround,
>
> public class TextDocValueField extends TextField {
>
>   @Override
>   public List createFields(SchemaField field, Object value, 
> float boost) {
> if (field.hasDocValues()) {
>   List fields = new ArrayList<>();
>   fields.add(createField(field, value, boost));
>   final BytesRef bytes = new BytesRef(value.toString());
>   if (field.multiValued()) {
> fields.add(new SortedSetDocValuesField(field.getName(), bytes));
>   } else {
> fields.add(new SortedDocValuesField(field.getName(), bytes));
>   }
>   return fields;
> } else {
> //  return Collections.singletonList(createField(field, value, boost));
>   return super.createFields(field, value, boost);
> }
>   }
>
>   @Override
>   public void checkSchemaField(final SchemaField field) {
> // do nothing
>   }
>
>   @Override
>   public boolean multiValuedFieldCache() {
> return false;
>   }
> }
>
> I wish this can be supported by solr so that I don’t have to maintain my own 
> repo.
>
>
>
> What do you think?
>
> Regards,
> Harry
>
>
>> On Jan 5, 2016, at 10:51 PM, Alok Bhandari  
>> wrote:
>>
>> Thanks Markus.
>>
>>
>>
>> --
>> View this message in context: 
>> http://lucene.472066.n3.nabble.com/How-to-use-DocValues-with-TextField-tp4248647p4248797.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: migrating solr 4.2.1 to 5.X

2016-01-26 Thread Erick Erickson
Yes and Yes. The developers try very hard to make Solr
one major release backwards compatible. So 5x should be
able to read 4x just fine.

Nothing has really changed in replication, so 5x supports
master/slave. It's just becoming less popular as SolrCloud
is significantly easier to operationalize.

Note that as segments get written they will be transformed from 4x
format to 5x. And you can also use the index upgrade tool here:
https://lucene.apache.org/core/5_0_0/core/org/apache/lucene/index/IndexUpgrader.html

to transform your 4x to 5x

Best,
Erick

On Tue, Jan 26, 2016 at 7:51 AM, Midas A  wrote:
> I want migrate from solr 4.2.1 to 5.X version hten my question is
>
> - can i use same snapshot of 4.2.1 in 5.x.x
>
> Actually Indexing will take long time in my case then it would be possible
> to do
> or we should not do this.
>
>
> next similar question is
>
> - can we replicate 4.2.1 master to slave 5.x.x solr


performance effect a thread doing an update has on other search threads

2016-01-26 Thread derekallwardt
We have an application (backed by Solr 5.x) that does a lot of updates
interleaved with queries. For the sake of better understanding the
performance effect that the ratio of updates to queries has on query
performance, we tested the following two scenarios.

scenario 1:
10 threads doing updates to docs w/ softCommit=true and waitSearcher=true
20 threads doing queries

scenario 2:
20 threads doing updates to docs w/ softCommit=true and waitSearcher=true
10 threads doing queries

- assume that in both scenarios system load, GC, etc... is not contributing
to any performance degradation
- both scenarios pretty much ensure that the Solr caches will always be
invalidated, so the queries in either scenario don't get the benefit of the
cache 

scenario 2 has much worse query performance. Why?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/performance-effect-a-thread-doing-an-update-has-on-other-search-threads-tp4253411.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr partial date range search

2016-01-26 Thread vsriram30
Thanks Shawn for providing more info. The looks like for supporting partial
date range search, I would need to rely on String regex search like
fieldName:2016-01*

Though this can support part of the functionality, but if I would like to
search between start and end date, this might not come good, like
fieldName:[2016-01-10 TO 2016-01-21]. This cannot be achieved by indexing
the date in String field.

Hence as you mentioned, I would need to upgrade my solr to 5.x to fully
support this. Thanks again.

-Sriram



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-partial-date-range-search-tp4253226p4253412.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr partial date range search

2016-01-26 Thread vsriram30
Probably, I should not have mentioned, it cannot be achieved, as still we can
achieve that by using multiple OR queries with regex matching on that String
field, though it doesn't look good :-)

-Sriram



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-partial-date-range-search-tp4253226p4253415.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SolrCloud replicas out of sync

2016-01-26 Thread Jeff Wartes

My understanding is that the "version" represents the timestamp the searcher 
was opened, so it doesn’t really offer any assurances about your data.

Although you could probably bounce a node and get your document counts back in 
sync (by provoking a check), it’s interesting that you’re in this situation. It 
implies to me that at some point the leader couldn’t write a doc to one of the 
replicas, but that the replica didn’t consider itself down enough to check 
itself.

You might watch the achieved replication factor of your updates and see if it 
ever changes:
https://cwiki.apache.org/confluence/display/solr/Read+and+Write+Side+Fault+Tolerance
 (See Achieved Replication Factor/min_rf)

If it does, that might give you clues about how this is happening. Also, it 
might allow you to work around the issue by trying the write again.






On 1/22/16, 10:52 AM, "David Smith"  wrote:

>I have a SolrCloud v5.4 collection with 3 replicas that appear to have fallen 
>permanently out of sync.  Users started to complain that the same search, 
>executed twice, sometimes returned different result counts.  Sure enough, our 
>replicas are not identical:
>
>>> shard1_replica1:  89867 documents / version 1453479763194
>>> shard1_replica2:  89866 documents / version 1453479763194
>>> shard1_replica3:  89867 documents / version 1453479763191
>
>I do not think this discrepancy is going to resolve itself.  The Solr Admin 
>screen reports all 3 replicas as “Current”.  The last modification to this 
>collection was 2 hours before I captured this information, and our auto commit 
>time is 60 seconds.  
>
>I have a lot of concerns here, but my first question is if anyone else has had 
>problems with out of sync replicas, and if so, what they have done to correct 
>this?
>
>Kind Regards,
>
>David
>


Re: Windows post.jar Can't unambiguously select between fixed arity signatures

2016-01-26 Thread Erik Hatcher
> On Jan 26, 2016, at 10:25 AM, Netz, Steffen  
> wrote:
> 2 related beginner questions:
>   Is there a good tutorial for filesystem search ( with howto 
> config-files)?

I’d humbly like to submit that example/files is this.  It’s a work in progress, 
with potential but plenty of rough edges (and I’ve got fixes for the below and 
some other things soon to be uploaded as a patch on SOLR-8590 soon), but it’s 
designed to work with anything with “content”, even image files that might have 
kinda junk or empty content.  Anything that /update/extract (via Tika) can 
handle.  

So in that case, 'bin/solr create -c files -d example/files; bin/post -c files 
some_files/; open http://localhost:8983/solr/files/browse' is the way to go.   
Let’s work to make it better!   Suggestions welcome on that JIRA ticket.

Erik




Re: SolrCloud replicas out of sync

2016-01-26 Thread David Smith
Thanks Jeff!  A few comments

>>
>> Although you could probably bounce a node and get your document counts back 
>> in sync (by provoking a check)
>>
 

If the check is a simple doc count, that will not work. We have found that 
replica1 and replica3, although they contain the same doc count, don’t have the 
SAME docs.  They each missed at least one update, but of different docs.  This 
also means none of our three replicas are complete.

>>
>>it’s interesting that you’re in this situation. It implies to me that at some 
>>point the leader couldn’t write a doc to one of the replicas,
>>

That is our belief as well. We experienced a datacenter-wide network disruption 
of a few seconds, and user complaints started the first workday after that 
event.  

The most interesting log entry during the outage is this:

"1/19/2016, 5:08:07 PM ERROR null DistributedUpdateProcessorRequest says it is 
coming from leader,​ but we are the leader: 
update.distrib=FROMLEADER=http://dot.dot.dot.dot:8983/solr/blah_blah_shard1_replica3/=javabin=2;

>>
>> You might watch the achieved replication factor of your updates and see if 
>> it ever changes
>>

This is a good tip. I’m not sure I like the implication that any failure to 
write all 3 of our replicas must be retried at the app layer.  Is this really 
how SolrCloud applications must be built to survive network partitions without 
data loss? 

Regards,

David


On 1/26/16, 12:20 PM, "Jeff Wartes"  wrote:

>
>My understanding is that the "version" represents the timestamp the searcher 
>was opened, so it doesn’t really offer any assurances about your data.
>
>Although you could probably bounce a node and get your document counts back in 
>sync (by provoking a check), it’s interesting that you’re in this situation. 
>It implies to me that at some point the leader couldn’t write a doc to one of 
>the replicas, but that the replica didn’t consider itself down enough to check 
>itself.
>
>You might watch the achieved replication factor of your updates and see if it 
>ever changes:
>https://cwiki.apache.org/confluence/display/solr/Read+and+Write+Side+Fault+Tolerance
> (See Achieved Replication Factor/min_rf)
>
>If it does, that might give you clues about how this is happening. Also, it 
>might allow you to work around the issue by trying the write again.
>
>
>
>
>
>
>On 1/22/16, 10:52 AM, "David Smith"  wrote:
>
>>I have a SolrCloud v5.4 collection with 3 replicas that appear to have fallen 
>>permanently out of sync.  Users started to complain that the same search, 
>>executed twice, sometimes returned different result counts.  Sure enough, our 
>>replicas are not identical:
>>
 shard1_replica1:  89867 documents / version 1453479763194
 shard1_replica2:  89866 documents / version 1453479763194
 shard1_replica3:  89867 documents / version 1453479763191
>>
>>I do not think this discrepancy is going to resolve itself.  The Solr Admin 
>>screen reports all 3 replicas as “Current”.  The last modification to this 
>>collection was 2 hours before I captured this information, and our auto 
>>commit time is 60 seconds.  
>>
>>I have a lot of concerns here, but my first question is if anyone else has 
>>had problems with out of sync replicas, and if so, what they have done to 
>>correct this?
>>
>>Kind Regards,
>>
>>David
>>



Re: Solr partial date range search

2016-01-26 Thread Erick Erickson
You do not have to upgrade or use strings to support this use-case,
just specify the full date format.

i.e. instead of this:
2016-01-10 TO 2016-01-21

use this:
2016-01-10T00:00:00Z TO 2016-01-21T00:00:00Z

Your performance will be much worse with string types and regexes
 than either tdate or dateRange.

Best,
Erick

On Tue, Jan 26, 2016 at 10:13 AM, vsriram30  wrote:
> Probably, I should not have mentioned, it cannot be achieved, as still we can
> achieve that by using multiple OR queries with regex matching on that String
> field, though it doesn't look good :-)
>
> -Sriram
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Solr-partial-date-range-search-tp4253226p4253415.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: performance effect a thread doing an update has on other search threads

2016-01-26 Thread Erick Erickson
Because the second form is opening searchers, invalidating caches and
doing warmup queries twice as often would be my guess.

But this is an invalid test in my opinion. I'm reading this that
you're issuing the soft commits from the clients. This is definitely
an anti-pattern, I _strongly_ recommend that you set your commit (both
soft and hard) in solrconfig.xml and do NOT do this from the clients.

Here's a long blog on the topic:
http://searchhub.org/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/

Best,
Erick

On Tue, Jan 26, 2016 at 10:04 AM, derekallwardt  wrote:
> We have an application (backed by Solr 5.x) that does a lot of updates
> interleaved with queries. For the sake of better understanding the
> performance effect that the ratio of updates to queries has on query
> performance, we tested the following two scenarios.
>
> scenario 1:
> 10 threads doing updates to docs w/ softCommit=true and waitSearcher=true
> 20 threads doing queries
>
> scenario 2:
> 20 threads doing updates to docs w/ softCommit=true and waitSearcher=true
> 10 threads doing queries
>
> - assume that in both scenarios system load, GC, etc... is not contributing
> to any performance degradation
> - both scenarios pretty much ensure that the Solr caches will always be
> invalidated, so the queries in either scenario don't get the benefit of the
> cache
>
> scenario 2 has much worse query performance. Why?
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/performance-effect-a-thread-doing-an-update-has-on-other-search-threads-tp4253411.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: SolrCloud replicas out of sync

2016-01-26 Thread Jeff Wartes

Ah, perhaps you fell into something like this then? 
https://issues.apache.org/jira/browse/SOLR-7844

That says it’s fixed in 5.4, but that would be an example of a split-brain type 
incident, where different documents were accepted by different replicas who 
each thought they were the leader. If this is the case, and you actually have 
different data on each replica, I’m not aware of any way to fix the problem 
short of reindexing those documents. Before that, you’ll probably need to 
choose a replica and just force the others to get in sync with it. I’d choose 
the current leader, since that’s slightly easier.

Typically, a leader writes an update to it’s transaction log, then sends the 
request to all replicas, and when those all finish it acknowledges the update. 
If a replica gets restarted, and is less than N documents behind, the leader 
will only replay that transaction log. (Where N is the numRecordsToKeep 
configured in the updateLog section of solrconfig.xml)

What you want is to provoke the heavy-duty process normally invoked if a 
replica has missed more than N docs, which essentially does a checksum and file 
copy on all the raw index files. FetchIndex would probably work, but it’s a 
replication handler API originally designed for master/slave replication, so 
take care: https://wiki.apache.org/solr/SolrReplication#HTTP_API
Probably a lot easier would be to just delete the replica and re-create it. 
That will also trigger a full file copy of the index from the leader onto the 
new replica.

I think design decisions around Solr generally use CP as a goal. (I sometimes 
wish I could get more AP behavior!) See posts like this: 
http://lucidworks.com/blog/2014/12/10/call-maybe-solrcloud-jepsen-flaky-networks/
 
So the fact that you encountered this sounds like a bug to me.
That said, another general recommendation (of mine) is that you not use Solr as 
your primary data source, so you can rebuild your index from scratch if you 
really need to. 






On 1/26/16, 1:10 PM, "David Smith"  wrote:

>Thanks Jeff!  A few comments
>
>>>
>>> Although you could probably bounce a node and get your document counts back 
>>> in sync (by provoking a check)
>>>
> 
>
>If the check is a simple doc count, that will not work. We have found that 
>replica1 and replica3, although they contain the same doc count, don’t have 
>the SAME docs.  They each missed at least one update, but of different docs.  
>This also means none of our three replicas are complete.
>
>>>
>>>it’s interesting that you’re in this situation. It implies to me that at 
>>>some point the leader couldn’t write a doc to one of the replicas,
>>>
>
>That is our belief as well. We experienced a datacenter-wide network 
>disruption of a few seconds, and user complaints started the first workday 
>after that event.  
>
>The most interesting log entry during the outage is this:
>
>"1/19/2016, 5:08:07 PM ERROR null DistributedUpdateProcessorRequest says it is 
>coming from leader,​ but we are the leader: 
>update.distrib=FROMLEADER=http://dot.dot.dot.dot:8983/solr/blah_blah_shard1_replica3/=javabin=2;
>
>>>
>>> You might watch the achieved replication factor of your updates and see if 
>>> it ever changes
>>>
>
>This is a good tip. I’m not sure I like the implication that any failure to 
>write all 3 of our replicas must be retried at the app layer.  Is this really 
>how SolrCloud applications must be built to survive network partitions without 
>data loss? 
>
>Regards,
>
>David
>
>
>On 1/26/16, 12:20 PM, "Jeff Wartes"  wrote:
>
>>
>>My understanding is that the "version" represents the timestamp the searcher 
>>was opened, so it doesn’t really offer any assurances about your data.
>>
>>Although you could probably bounce a node and get your document counts back 
>>in sync (by provoking a check), it’s interesting that you’re in this 
>>situation. It implies to me that at some point the leader couldn’t write a 
>>doc to one of the replicas, but that the replica didn’t consider itself down 
>>enough to check itself.
>>
>>You might watch the achieved replication factor of your updates and see if it 
>>ever changes:
>>https://cwiki.apache.org/confluence/display/solr/Read+and+Write+Side+Fault+Tolerance
>> (See Achieved Replication Factor/min_rf)
>>
>>If it does, that might give you clues about how this is happening. Also, it 
>>might allow you to work around the issue by trying the write again.
>>
>>
>>
>>
>>
>>
>>On 1/22/16, 10:52 AM, "David Smith"  wrote:
>>
>>>I have a SolrCloud v5.4 collection with 3 replicas that appear to have 
>>>fallen permanently out of sync.  Users started to complain that the same 
>>>search, executed twice, sometimes returned different result counts.  Sure 
>>>enough, our replicas are not identical:
>>>
> shard1_replica1:  89867 documents / version 1453479763194
> shard1_replica2:  89866 documents / version 1453479763194
> 

Re: Solr partial date range search

2016-01-26 Thread vsriram30
Yes Eric. I am using that full date form based date range query till now and
we have a requirement change to search based on partial date ranges. Hence
was looking at these options.

Kind Regards,
-Sriram



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-partial-date-range-search-tp4253226p4253488.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr partial date range search

2016-01-26 Thread Erick Erickson
Still, I have to ask why bother? Presumably you have some kind of front-end
that takes the dates. Simply have that form the proper full date specification.

Or create a query component that intercepts the query on the Solr side and
massages it enough to form the full date. Or

Using strings and wildcards is the _last_ thing I'd do, and that only
in extremis.

Best,
Erick

On Tue, Jan 26, 2016 at 2:55 PM, vsriram30  wrote:
> Yes Eric. I am using that full date form based date range query till now and
> we have a requirement change to search based on partial date ranges. Hence
> was looking at these options.
>
> Kind Regards,
> -Sriram
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Solr-partial-date-range-search-tp4253226p4253488.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: indexing rich data with solr 5.3.1 integreting in Ubuntu server

2016-01-26 Thread Emir Arnautovic

Hi,
I would first check if external libraries are present and loaded. How do 
you start Solr? Try explicitly setting solr.install.dir or set absolute 
path to libs and see in logs if they are loaded.





Thanks,
Emir

On 25.01.2016 15:16, kostali hassan wrote:

0down votefavorite


I have a problem with integrating solr in Ubuntu server.Before using solr
on ubuntu server i tested it on my mac it was working perfectly for DIH
request handler and update/extract. it indexed my PDF,Doc,Docx documents.so
after installing solr on ubuntu server and using the same configuration
files and librairies. i've found out that solr doesn't index PDf documents
and none Error and any exceptions in solr log.But i can search over .Doc
and .Docx documents.

here some parts of my solrconfig.xml contents :


   


 
   true
   ignored_
   _text_
 
   

DIH config:



tika.config.xml



tika.config.xml


 
 
 
 
 
 
  



 

 


 
 
 
 




--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/



Re: To Detect Wheter Core is Available To Post

2016-01-26 Thread Emir Arnautovic

Hi Edwin,
Assuming you are using SolrCloud - why do you need specific core? Can 
you use some of status actions from collection API - there is 
CLUSTERSTATUS action?


Thanks,
Emir

On 26.01.2016 05:34, Edwin Lee wrote:

Hi All,

Our team is using the Solr to process log and we met a problem in SOLR
posting.

We want to detect the health of each core —- whether they are available to
post. We have to figure out ways to do that:

1. Using luke request . —- Cost is a bit high for core loading
2. We have designed a cache and adding the hook when the core is open or
closed to record whether the core is loaded. —- *Question: If a core is
loaded, is there situation that we still cannot post data to it?*
3. We try to post some meanless data with our unique id, and delete that
data within the same commit, for example, if we use json to post data, it
is like this,

{
 "add": {
 "doc":{
 "id": "%%ID%%"
 }
 },
 "delete": { "id": "%%ID%%" },
 "commit": {}}

*But we still not 100% sure whether it will mess up with our normal data.*

What is the best way for this requirment. We want to consult your opinions.

Thank you.

Regards,
Edwin Lee
20160126



--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/