Re: Indexing using CSV

2016-03-21 Thread Zheng Lin Edwin Yeo
Are you using post.jar or other methods of indexing the CSV file?

You have to ensure that the number of fields in your CSV file are the same
as the fields in Solr.
Also, each records in the CSV file must be on separate lines, and each
record must have the same number of fields, with each field separate by a
"," (even for fields that are empty)

Regards,
Edwin


On 21 March 2016 at 23:46, Paul Hoffman  wrote:

> On Sun, Mar 20, 2016 at 06:11:32PM -0700, Jay Potharaju wrote:
> > Hi,
> > I am trying to index some data using csv files. The data contains
> > description column, which can include quotes, comma, LF/CR & other
> special
> > characters.
> >
> > I have it working but run into an issue with the following error
> >
> > line=5,can't read line: 5 values={NO LINES AVAILABLE}.
> >
> > What is the best way to debug this issue and secondly how do other people
> > handle indexing data using csv data.
>
> I would concentrate first on getting the CSV reader working verifiably,
> which might be the hardest part -- CSV is not a file format, it's a
> hodgepodge.
>
> Paul.
>
> --
> Paul Hoffman 
> Systems Librarian
> Fenway Libraries Online
> c/o Wentworth Institute of Technology
> 550 Huntington Ave.
> Boston, MA 02115
> (617) 442-2384 (FLO main number)
>


Re: How fast indexing?

2016-03-21 Thread Shawn Heisey
On 3/21/2016 7:48 PM, Amit Jha wrote:
> When I run the same sql on DB it takes only 1 sec. And 6-7 documents are 
> getting indexed per second. 

That's really slow.  It seems likely that you are having extreme
performance issues due to garbage collection problems, possibly from a
heap that needs to be larger.  I will need a lot more information about
your hardware/Solr setup to figure anything out.  Some info that might
be useful:

* Solr version.
* RAM installed in each machine.
* The max heap size on each machine.
* The amount of index data contained on each machine.
* How many Solr documents live on each machine.
* Anything else you can think of that might be helpful.

> As I've 4 node solrCloud setup, can I run 4 import handler to index the same 
> data? Will it not over write?

DIH is generally not the best way to index to SolrCloud.  The DIH
feature was created *long* before SolrCloud ever existed -- it was
designed for single-core indexes.  The best option for indexing to
SolrCloud is a SolrJ program using CloudSolrClient, or another program
that can create indexing requests you can send to the /update handler,
ideally having multiple requests in parallel.

> 10-20k is very high in numbers, where can I get the actual size of document.

You'd need to check your database, add up the sizes of all the columns
that Solr is indexing for a typical document.

Thanks,
Shawn



Re: How fast indexing?

2016-03-21 Thread Amit Jha
When I run the same sql on DB it takes only 1 sec. And 6-7 documents are 
getting indexed per second. 

As I've 4 node solrCloud setup, can I run 4 import handler to index the same 
data? Will it not over write? 

10-20k is very high in numbers, where can I get the actual size of document.

Rgds
AJ

> On 22-Mar-2016, at 05:32, Shawn Heisey  wrote:
> 
>> On 3/20/2016 6:11 PM, Amit Jha wrote:
>> In my case I am using DIH to index the data and Query is having 2 join 
>> statements. To index 70K documents it is taking 3-4Hours. Document size 
>> would be around 10-20KB. DB is MSSQL and using solr4.2.10 in cloud mode.
> 
> My source data is in a MySQL database.  I use DIH for full rebuilds and
> SolrJ for maintenance.
> 
> My index is sharded, but I'm not running SolrCloud.  When using DIH, all
> of my shards build at once, and each one achieves about 750 docs per
> second.  With six large shards, rebuilding a 146 million document index
> takes 9-10 hours.  It produces a total index size in the ballpark of 170GB.
> 
> DIH has a performance limitation -- it's single-threaded.  I obtain the
> speeds that I do because all of my shards import at the same time -- six
> dataimport instances running at the same time, each one with a single
> thread, importing a little more than 24 million documents.  I have
> discovered that Solr is the bottleneck on my setup.  The data retrieval
> from MySQL can proceed much faster than Solr can handle with a single
> indexing thread.  My situation is a little bit unusual -- as Erick
> mentioned, usually the bottleneck is data retrieval, not Solr.
> 
> At this point, if I want to make bulk indexing go faster, I need to
> build a SolrJ application that can index with multiple threads to each
> Solr core at the same time.  This is on my roadmap, but it's not going
> to be a trivial project.
> 
> At 10-20K, your documents are large, but not excessively so.  If 7
> documents takes 3-4 hours, then there's one of a few problems happening.
> 
> 1) your database is VERY slow.
> 2) your analysis chain in schema.xml is running SUPER slow analysis
> components.
> 3) Your server or its configuration is not providing enough resources
> (CPU/RAM/IO) so Solr can run efficiently.
> 
> #2 seems rather unlikely, so I would suspect one of the other two.
> 
> 
> 
> I have seen one situation related to the Microsoft side of your setup
> that might cause a problem like this.  If any of your machines are
> running on Windows Server 2012 and you have bridged NICs (usually for
> failover in the event of a switch failure), then you will need to break
> the bridge and just run one NIC.
> 
> The performance improvement on the network when a bridged NIC is removed
> from Server 2012 is enough to blow your mind, especially if the access
> is over a high-latency network link, like a VPN or WAN connection.  The
> same setup on Server 2003 or Server 2008 has very good performance.
> Microsoft seems to have a bug with bridged NICs in Server 2012.  Last
> time I tried to figure out whether it could be fixed, I ran into this
> problem:
> 
> https://xkcd.com/979/
> 
> Thanks,
> Shawn
> 


Re: How fast indexing?

2016-03-21 Thread Amit Jha
Yes, I do have multiple modes in my solr cloud setup.

Rgds
AJ

> On 21-Mar-2016, at 22:20, fabigol  wrote:
> 
> Amit Jha,
> do you have several sold server with solr cloud?
> 
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/How-fast-indexing-tp4264994p4265122.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: PERFORMANCE WARNING: Overlapping onDeckSearchers=2

2016-03-21 Thread Shawn Heisey
On 3/21/2016 6:49 PM, Aswath Srinivasan (TMS) wrote:
>>> Thank you for the responses. Collection crashes as in, I'm unable to open 
>>> the core tab in Solr console. Search is not returning. None of the page 
>>> opens in solr admin dashboard.
>>>
>>> I do understand how and why this issue occurs and I'm going to do all it 
>>> takes to avoid this issue. However, on an event of an accidental frequent 
>>> hard commit close to each other which throws this WARN then - I'm just 
>>> trying to figure out a way to make my collection throw results without 
>>> having to delete and re-create the collection or delete the data folder.
>>>
>>> Again, I know how to avoid this issue but if it still happens then what can 
>>> be done to avoid a complete reindexing.

If you're not actually hitting OutOfMemoryError, then my best guess
about what's happening is that you are running right at the edge of the
available Java heap memory, so your JVM is constantly running full
garbage collections to free up enough memory for normal operation.  In
this situation, Solr is actually still running, but is spending most of
its time paused for garbage collection.

https://wiki.apache.org/solr/SolrPerformanceProblems#GC_pause_problems

The first part of the "GC pause problems" section on the above wiki page
talks about very large heaps, but there is a paragraph just before
"Tools and Garbage Collection" that talks about heaps that are a little
bit too small.

If I'm right about this, you're going to need to increase your java heap
size.  Exactly how to do this will depend on what version of Solr you're
running, how you installed it, and how you start it.

For 5.x versions using the included scripts, you can use the "-m" option
on the "bin/solr" command when you start Solr manually, or you can edit
the solr.in.sh file (usually found in /etc/default or /var/solr) if you
used the service installer script on a UNIX/Linux platform.  The default
heap size in 5.x scripts is 512MB, which is VERY small.

For earlier versions, there's too many install/start options available. 
There were no installation scripts included with Solr itself, so I won't
know anything about the setup.

If I'm wrong about what's happening, then we'll need a lot more details
about your server and your Solr setup.

Thanks,
Shawn



RE: PERFORMANCE WARNING: Overlapping onDeckSearchers=2

2016-03-21 Thread Aswath Srinivasan (TMS)
>>The only way that I can imagine any part of Solr *crashing* when this message 
>>happens is if you are also hitting an OutOfMemoryError

exception.   You've said that your collection crashes ... but not what

actually happens -- what "crash" means for your situation.  I've never heard of 
a collection crashing.



>>If you're running version 4.0 or later, you actually *do* want autoCommit 
>>configured, with openSearcher set to false.  This configuration will not 
>>change document visibility at all, because it will not open a new searcher.  
>>You need different commits for document visibility.


Thank you for the responses. Collection crashes as in, I'm unable to open the 
core tab in Solr console. Search is not returning. None of the page opens in 
solr admin dashboard.

I do understand how and why this issue occurs and I'm going to do all it takes 
to avoid this issue. However, on an event of an accidental frequent hard commit 
close to each other which throws this WARN then - I'm just trying to figure out 
a way to make my collection throw results without having to delete and 
re-create the collection or delete the data folder.

Again, I know how to avoid this issue but if it still happens then what can be 
done to avoid a complete reindexing.

Thank you,
Aswath NS

-Original Message-
From: Shawn Heisey [mailto:apa...@elyograg.org]
Sent: Monday, March 21, 2016 4:19 PM
To: solr-user@lucene.apache.org
Subject: Re: PERFORMANCE WARNING: Overlapping onDeckSearchers=2

On 3/21/2016 12:52 PM, Aswath Srinivasan (TMS) wrote:
> Fellow developers,
>
> PERFORMANCE WARNING: Overlapping onDeckSearchers=2
>
> I'm seeing this warning often and whenever I see this, the collection 
> crashes. The only way to overcome this is by deleting the data folder and 
> reindexing.
>
> In my observation, this WARN comes when I hit frequent hard commits or hit 
> re-load config. I'm not planning on to hit frequent hard commits, however 
> sometimes accidently it happens. And when it happens the collection crashes 
> without a recovery.
>
> Have you faced this issue? Is there a recovery procedure for this WARN?
>
> Also, I don't want to increase maxWarmingSearchers or set autocommit.

This is a lot of the same info that you've gotten from Hoss. I'm just going to 
leave it all here and add a little bit related to the rest of the thread.

Increasing maxWarmingSearchers is almost always the WRONG thing to do.
The reason that you are running into this message is that your commits (those 
that open a new searcher) are taking longer to finish than your commit 
frequency, so you end up warming multiple searchers at the same time. To limit 
memory usage, Solr will keep the number of warming searches from exceeding a 
threshold.

You need to either reduce the frequency of your commits that open a new 
searcher or change your configuration so they complete faster. Here's some info 
about slow commits:

http://wiki.apache.org/solr/SolrPerformanceProblems#Slow_commits

The only way that I can imagine any part of Solr *crashing* when this message 
happens is if you are also hitting an OutOfMemoryError
exception. You've said that your collection crashes ... but not what
actually happens -- what "crash" means for your situation. I've never heard of 
a collection crashing.

If you're running version 4.0 or later, you actually *do* want autoCommit 
configured, with openSearcher set to false. This configuration will not change 
document visibility at all, because it will not open a new searcher. You need 
different commits for document visibility.

This is the updateHandler config that I use which includes autoCommit:



12
false




With this config, there will be at least two minutes between automatic hard 
commits. Because these commits will not open a new searcher, they cannot cause 
the message about onDeckSearchers. Commits that do not open a new searcher will 
normally complete VERY quickly. The reason you want this kind of autoCommit 
configuration is to avoid extremely large transaction logs.

See this blog post for more info than you ever wanted about commits:

http://lucidworks.com/blog/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/

If you're going to do all your indexing with the dataimport handler, you could 
just let the commit option on the dataimport take care of document visibility.

Thanks,
Shawn


Re: How fast indexing?

2016-03-21 Thread Shawn Heisey
On 3/20/2016 6:11 PM, Amit Jha wrote:
> In my case I am using DIH to index the data and Query is having 2 join 
> statements. To index 70K documents it is taking 3-4Hours. Document size would 
> be around 10-20KB. DB is MSSQL and using solr4.2.10 in cloud mode.

My source data is in a MySQL database.  I use DIH for full rebuilds and
SolrJ for maintenance.

My index is sharded, but I'm not running SolrCloud.  When using DIH, all
of my shards build at once, and each one achieves about 750 docs per
second.  With six large shards, rebuilding a 146 million document index
takes 9-10 hours.  It produces a total index size in the ballpark of 170GB.

DIH has a performance limitation -- it's single-threaded.  I obtain the
speeds that I do because all of my shards import at the same time -- six
dataimport instances running at the same time, each one with a single
thread, importing a little more than 24 million documents.  I have
discovered that Solr is the bottleneck on my setup.  The data retrieval
from MySQL can proceed much faster than Solr can handle with a single
indexing thread.  My situation is a little bit unusual -- as Erick
mentioned, usually the bottleneck is data retrieval, not Solr.

At this point, if I want to make bulk indexing go faster, I need to
build a SolrJ application that can index with multiple threads to each
Solr core at the same time.  This is on my roadmap, but it's not going
to be a trivial project.

At 10-20K, your documents are large, but not excessively so.  If 7
documents takes 3-4 hours, then there's one of a few problems happening.

1) your database is VERY slow.
2) your analysis chain in schema.xml is running SUPER slow analysis
components.
3) Your server or its configuration is not providing enough resources
(CPU/RAM/IO) so Solr can run efficiently.

#2 seems rather unlikely, so I would suspect one of the other two.



I have seen one situation related to the Microsoft side of your setup
that might cause a problem like this.  If any of your machines are
running on Windows Server 2012 and you have bridged NICs (usually for
failover in the event of a switch failure), then you will need to break
the bridge and just run one NIC.

The performance improvement on the network when a bridged NIC is removed
from Server 2012 is enough to blow your mind, especially if the access
is over a high-latency network link, like a VPN or WAN connection.  The
same setup on Server 2003 or Server 2008 has very good performance.
 Microsoft seems to have a bug with bridged NICs in Server 2012.  Last
time I tried to figure out whether it could be fixed, I ran into this
problem:

https://xkcd.com/979/

Thanks,
Shawn



Re: PERFORMANCE WARNING: Overlapping onDeckSearchers=2

2016-03-21 Thread Shawn Heisey
On 3/21/2016 12:52 PM, Aswath Srinivasan (TMS) wrote:
> Fellow developers,
>
> PERFORMANCE WARNING: Overlapping onDeckSearchers=2
>
> I'm seeing this warning often and whenever I see this, the collection 
> crashes. The only way to overcome this is by deleting the data folder and 
> reindexing.
>
> In my observation, this WARN comes when I hit frequent hard commits or hit 
> re-load config. I'm not planning on to hit frequent hard commits, however 
> sometimes accidently it happens. And when it happens the collection crashes 
> without a recovery.
>
> Have you faced this issue? Is there a recovery procedure for this WARN?
>
> Also, I don't want to increase maxWarmingSearchers or set autocommit.

This is a lot of the same info that you've gotten from Hoss.  I'm just
going to leave it all here and add a little bit related to the rest of
the thread.

Increasing maxWarmingSearchers is almost always the WRONG thing to do. 
The reason that you are running into this message is that your commits
(those that open a new searcher) are taking longer to finish than your
commit frequency, so you end up warming multiple searchers at the same
time.  To limit memory usage, Solr will keep the number of warming
searches from exceeding a threshold.

You need to either reduce the frequency of your commits that open a new
searcher or change your configuration so they complete faster.  Here's
some info about slow commits:

http://wiki.apache.org/solr/SolrPerformanceProblems#Slow_commits

The only way that I can imagine any part of Solr *crashing* when this
message happens is if you are also hitting an OutOfMemoryError
exception.   You've said that your collection crashes ... but not what 
actually happens -- what "crash" means for your situation.  I've never
heard of a collection crashing.

If you're running version 4.0 or later, you actually *do* want
autoCommit configured, with openSearcher set to false.  This
configuration will not change document visibility at all, because it
will not open a new searcher.  You need different commits for document
visibility.

This is the updateHandler config that I use which includes autoCommit:



  
12
false
  
  


With this config, there will be at least two minutes between automatic
hard commits.  Because these commits will not open a new searcher, they
cannot cause the message about onDeckSearchers.  Commits that do not
open a new searcher will normally complete VERY quickly.  The reason you
want this kind of autoCommit configuration is to avoid extremely large
transaction logs.

See this blog post for more info than you ever wanted about commits:

http://lucidworks.com/blog/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/

If you're going to do all your indexing with the dataimport handler, you
could just let the commit option on the dataimport take care of document
visibility.

Thanks,
Shawn



Suspicious message with attachment

2016-03-21 Thread help
The following message addressed to you was quarantined because it likely 
contains a virus:

Subject: RE: PERFORMANCE WARNING: Overlapping onDeckSearchers=2
From: "Aswath Srinivasan (TMS)" 

However, if you know the sender and are expecting an attachment, please reply 
to this message, and we will forward the quarantined message to you.


RE: PERFORMANCE WARNING: Overlapping onDeckSearchers=2

2016-03-21 Thread Aswath Srinivasan (TMS)
If you're seeing a crash, then that's a distinct problem from the WARN -- it 
might be related tothe warning, but it's not identical -- Solr doesn't always 
(or even normally) crash in the "Overlapping onDeckSearchers"
situation

That is what I hoped for. But I could see nothing else in the log. All I'm 
trying to do is run a full import in the DIH handler and index some 10 records 
from DB and check the "commit' check box. Then when I immediately re-run the 
full import again OR do a reload config, I start seeing this warning and my 
collection crashes.

I have turn off autocommit in the solrconfig.

I can try and avoid frequent hard commits but I wanted a solution to overcome 
this WARN if an accidental frequent hard commit happens.

Thank you,
Aswath NS



-Original Message-
From: Chris Hostetter [mailto:hossman_luc...@fucit.org]
Sent: Monday, March 21, 2016 2:26 PM
To: solr-user@lucene.apache.org
Subject: RE: PERFORMANCE WARNING: Overlapping onDeckSearchers=2


: What I'm wondering is, what should one do to fix this issue when it
: happens. Is there a way to recover? after the WARN appears.

It's just a warning that you have a sub-optimal situation from a performance 
standpoint -- either committing too fast, or warming too much.
It's not a failure, and Solr will continue to serve queries and process updates 
-- but meanwhile it's detected that the situation it's in involves wasted 
CPU/RAM.

: In my observation, this WARN comes when I hit frequent hard commits or
: hit re-load config. I'm not planning on to hit frequent hard commits,
: however sometimes accidently it happens. And when it happens the
: collection crashes without a recovery.

If you're seeing a crash, then that's a distinct problem from the WARN -- it 
might be related tothe warning, but it's not identical -- Solr doesn't always 
(or even normally) crash in the "Overlapping onDeckSearchers"
sitaution

So if you are seeing crashes, please give us more detials about these
crashes: namely more details about everything you are seeing in your logs (on 
all the nodes, even if only one node is crashing)

https://wiki.apache.org/solr/UsingMailingLists



-Hoss
http://www.lucidworks.com/


RE: PERFORMANCE WARNING: Overlapping onDeckSearchers=2

2016-03-21 Thread Chris Hostetter

: What I'm wondering is, what should one do to fix this issue when it 
: happens. Is there a way to recover? after the WARN appears.

It's just a warning that you have a sub-optimal situation from a 
performance standpoint -- either committing too fast, or warming too much.  
It's not a failure, and Solr will continue to serve queries and process 
updates -- but meanwhile it's detected that the situation it's in involves 
wasted CPU/RAM.

: In my observation, this WARN comes when I hit frequent hard commits or 
: hit re-load config. I'm not planning on to hit frequent hard commits, 
: however sometimes accidently it happens. And when it happens the 
: collection crashes without a recovery.

If you're seeing a crash, then that's a distinct problem from the WARN -- 
it might be related tothe warning, but it's not identical -- Solr doesn't 
always (or even normally) crash in the "Overlapping onDeckSearchers" 
sitaution

So if you are seeing  crashes, please give us more detials about these 
crashes: namely more details about everything you are seeing in your logs 
(on all the nodes, even if only one node is crashing)

https://wiki.apache.org/solr/UsingMailingLists



-Hoss
http://www.lucidworks.com/


RE: PERFORMANCE WARNING: Overlapping onDeckSearchers=2

2016-03-21 Thread Aswath Srinivasan (TMS)
Please note that I'm not looking to find ways to avoid this issue. There are 
lot of internet articles on this topic.

What I'm wondering is, what should one do to fix this issue when it happens. Is 
there a way to recover? after the WARN appears.

Thank you,
Aswath NS

-Original Message-
From: Aswath Srinivasan (TMS) [mailto:aswath.sriniva...@toyota.com] 
Sent: Monday, March 21, 2016 11:52 AM
To: solr-user@lucene.apache.org
Subject: PERFORMANCE WARNING: Overlapping onDeckSearchers=2

Fellow developers,

PERFORMANCE WARNING: Overlapping onDeckSearchers=2

I'm seeing this warning often and whenever I see this, the collection crashes. 
The only way to overcome this is by deleting the data folder and reindexing.

In my observation, this WARN comes when I hit frequent hard commits or hit 
re-load config. I'm not planning on to hit frequent hard commits, however 
sometimes accidently it happens. And when it happens the collection crashes 
without a recovery.

Have you faced this issue? Is there a recovery procedure for this WARN?

Also, I don't want to increase maxWarmingSearchers or set autocommit.

Thank you,
Aswath NS


RE: Explain score is different from score

2016-03-21 Thread Rick Sullivan
I haven't checked this thread since Friday, but here are my responses to the 
questions that have come up.

1. How is ranking affected?

Some documents have their scores divided by an integer value in the response 
documents.

2. Do you see the proper ranking in the explain section?

Yes, the explain section always seems to have consistent values and proper 
rankings.

3. What about the results?

No, these are ranked according to the sometimes incorrect score.

4. What version of Solr are you using?

I've produced the problem on SolrCloud 5.5.0 (2 shards on 2 nodes on the same 
machine), Solr 5.5.0 (no sharding), and Solr 5.4.1 (no sharding).
I've also had trouble reproducing the problem on test data.

Thanks,
-Rick


> Date: Mon, 21 Mar 2016 14:14:44 +
> From: iori...@yahoo.com.INVALID
> To: solr-user@lucene.apache.org
> Subject: Re: Explain score is different from score
>
>
>
> Hi Alessandro,
>
> OP have different ranking: fl=score and explain's score would have retrieve 
> different orders.
> I wrote test cases using ClassicSimilarity, but it won't re-produce.
> This is really weird. I wonder what is triggering this.
>
> aHmet
>
>
> On Monday, March 21, 2016 2:08 PM, Alessandro Benedetti 
>  wrote:
>
>
>
> I would like to add a question, how the ranking is affected ?
> Do you see the proper ranking in the explain section ?
> And what about the results ? Are they ranked accordingly the correct score,
> or they are ranked by the wrong score ?
> I got a similar issue, which I am not able to reproduce yet, but it was
> really really weird ( in my case I got also the ranking messed up_
>
> Cheers
>
>
> On Mon, Mar 21, 2016 at 7:30 AM, G, Rajesh  wrote:
>
>> Hi Ahmet,
>>
>> I am using solr 5.5.0. I am running single instance with single core. No
>> shards
>>
>> I have added  to my schema
>> as suggested by Rick Sullivan. Now the scores are same between explain and
>> score field.
>>
>> But instead of previous results "Lync - Microsoft Office 365" and
>> "Microsoft Office 365" I am getting
>>
>> {
>> "title":"Office 365",
>> "score":7.471676
>> },
>> {
>> "title":"Office 365",
>> "score":7.471676
>> },
>>
>> If I try NGram title:(Microsoft Ofice 365)
>>
>> The scores are same for top 10 results even though they are differing by
>> min of 3 characters. I have attached my schema.xml so it can help
>>
>> 
>> Lync - Microsoft Office 365
>> 52.056263
>> 
>> Microsoft Office 365
>> 52.056263
>> 
>> Microsoft Office 365 1.0
>> 52.056263
>> 
>> Microsoft Office 365 14.0
>> 52.056263
>> 
>> Microsoft Office 365 14.3
>> 52.056263
>> 
>> Microsoft Office 365 14.4
>> 52.056263
>> 
>> Microsoft Office 365 14.5(Mac)
>> 52.056263
>> 
>> Microsoft Office 365 15.0
>> 52.056263
>> 
>> Microsoft Office 365 16.0
>> 52.056263
>> 
>> Microsoft Office 365 4.0
>> 52.056263
>> 
>> Microsoft Office 365 E4
>> 52.056263
>> 
>> Microsoft Mail Protection Reports for Office 365
>> 15.0
>> 50.215454
>>
>> Thanks
>> Rajesh
>>
>>
>>
>> Corporate Executive Board India Private Limited. Registration No:
>> U741040HR2004PTC035324. Registered office: 6th Floor, Tower B, DLF Building
>> No.10 DLF Cyber City, Gurgaon, Haryana-122002, India.
>>
>> This e-mail and/or its attachments are intended only for the use of the
>> addressee(s) and may contain confidential and legally privileged
>> information belonging to CEB and/or its subsidiaries, including CEB
>> subsidiaries that offer SHL Talent Measurement products and services. If
>> you have received this e-mail in error, please notify the sender and
>> immediately, destroy all copies of this email and its attachments. The
>> publication, copying, in whole or in part, or use or dissemination in any
>> other way of this e-mail and attachments by anyone other than the intended
>> person(s) is prohibited.
>>
>> -Original Message-
>> From: Ahmet Arslan [mailto:iori...@yahoo.com]
>> Sent: Sunday, March 20, 2016 2:10 AM
>> To: solr-user@lucene.apache.org; G, Rajesh ;
>> r...@ricksullivan.net
>> Subject: Re: Explain score is different from score
>>
>> Hi Rick and Rajesh,
>>
>> I wasn't able re-produce this neither with lucene nor solr.
>> What version of solr is this?
>> Are you using a sharded request?
>>
>> @BeforeClass
>> public static void beforeClass() throws Exception {
>> initCore("solrconfig.xml", "schema.xml");
>>
>> assertU(adoc("id", "1722669", "title", "Lync - Microsoft Office 365"));
>> assertU(adoc("id", "2043876", "title", "Microsoft Office 365"));
>>
>> assertU(commit());
>>
>> }
>>
>> /**
>> * Checks whether fl=score equals to Explain's score */ @Test public void
>> testExplain() throws Exception { SolrQueryRequest req =
>> req(CommonParams.DEBUG_QUERY, "true", "indent", "true", "q",
>> "title:(Microsoft Ofice 365)", CommonParams.FL, "id,title,score"); String
>> response = h.query(req); System.out.println(response); }
>>
>> @Test
>> public void testExplain() throws Exception {

Re: Save Number of words in field

2016-03-21 Thread Jack Krupansky
You can write an Update Request Processor that would count the words in the
source value for a specified field and generate that count as an integer
value for another field.

My old Solr 4.x Deep Dive book has an example that uses a sequence (chain)
of existing update processors to count words in a multi-valued text field.
That's not as efficient as a custom or script update processor, but avoids
creating a custom processor.

See:
http://www.lulu.com/us/en/shop/jack-krupansky/solr-4x-deep-dive-early-access-release-7/ebook/product-21203548.html
Look for "regex-count-words".


-- Jack Krupansky

On Mon, Mar 21, 2016 at 12:15 PM, G, Rajesh  wrote:

> Hi,
>
> When indexing sentences I want to store the number of words in the
> sentence in a fields that I can use to with other query later for word
> count match. Please let me know whether it is possible?
>
> Thanks
> Rajesh
>
>
>
> Corporate Executive Board India Private Limited. Registration No:
> U741040HR2004PTC035324. Registered office: 6th Floor, Tower B, DLF Building
> No.10 DLF Cyber City, Gurgaon, Haryana-122002, India..
>
>
>
> This e-mail and/or its attachments are intended only for the use of the
> addressee(s) and may contain confidential and legally privileged
> information belonging to CEB and/or its subsidiaries, including CEB
> subsidiaries that offer SHL Talent Measurement products and services. If
> you have received this e-mail in error, please notify the sender and
> immediately, destroy all copies of this email and its attachments. The
> publication, copying, in whole or in part, or use or dissemination in any
> other way of this e-mail and attachments by anyone other than the intended
> person(s) is prohibited.
>
>
>


Re: Seasonal searches in SOLR 5.x

2016-03-21 Thread Jack Krupansky
You can write an Update Request Processor which takes a pair of date field
value and creates a season code value for a separate field, which could be
multivalued for date ranges spanning seasons. Similarly you could have
another generated multivalued field which listed the months when the data
was collected. You could decide whether to store this extra info as an
alphanumeric code or a sall integers (1-4 for seasons, 1-12 for months.)

-- Jack Krupansky

On Mon, Mar 21, 2016 at 1:26 PM, Ioannis Kirmitzoglou <
ioanniskirmitzog...@gmail.com> wrote:

> Hi all,
>
> I would like to implement seasonal date searches on date ranges. I’m using
> SOLR 5.4.1 and have indexed date ranges using a DateRangeField (let’s call
> this field date_ranges).
> Each document in SOLR corresponds to a biological sample and each sample
> was collected during a date range that can span from a single day to
> multiple years. For my application it makes sense to enable seasonal
> searches, ie find samples that were collected during a specific period of
> the year (e.g. summer, or February). In this type of search, the year that
> the sample was collected is not relevant, only the days of the year. I’ve
> been all over SOLR documentation and I haven’t been able to find anything
> that will enable do me that. The closest I got was a post with instructions
> on how to use a spatial field to do date searches (
> https://people.apache.org/~hossman/spatial-for-non-spatial-meetup-20130117/).
> Using the logic in that post I was able to come up with a solution but it’s
> rather complex and needs polygon searches (which in turn means installing
> the JTS Topology suite).
> Before committing to that I would like to ask for your input and whether
> there’s an easier way to do these types of searches.
>
> Many thanks,
>
> Ioannis
>
> -
> Ioannis Kirmitzoglou, PhD
> Bioinformatician - Scientific Programmer
> Imperial College, London
> www.vectorbase.org
> www.vigilab.org
>
>


PERFORMANCE WARNING: Overlapping onDeckSearchers=2

2016-03-21 Thread Aswath Srinivasan (TMS)
Fellow developers,

PERFORMANCE WARNING: Overlapping onDeckSearchers=2

I'm seeing this warning often and whenever I see this, the collection crashes. 
The only way to overcome this is by deleting the data folder and reindexing.

In my observation, this WARN comes when I hit frequent hard commits or hit 
re-load config. I'm not planning on to hit frequent hard commits, however 
sometimes accidently it happens. And when it happens the collection crashes 
without a recovery.

Have you faced this issue? Is there a recovery procedure for this WARN?

Also, I don't want to increase maxWarmingSearchers or set autocommit.

Thank you,
Aswath NS


date range faceting on the whole dataset

2016-03-21 Thread Alisa Z .
 Hello,

Is it possible to perform date range faceting on the whole dataset without 
indicating facet.range.start and facet.range.end? 
What if  I have no clue about when my data starts and when it ends (might be 
some point in the future)?  

A sample query: 
http://localhost:8983/solr/enron-path/select?q=*:*=0=true=date_tdt_tdt.facet.range.start=NOW-20YEAR_tdt.facet.range.end=NOW-14YEARS_tdt.facet.range.gap=%2B1DAY=true

However, in this case I found the range.start ans range.end points empirically, 
and there still is a lot of "blank" periods. Given, that I actually need to 
step by day, how to avoid unnecessary calculation on dates that are out of my 
data set?  

Thanks,

-- 
Alisa Zhila

Solr 3.6 Issues

2016-03-21 Thread Shailendra Tiwari
Hi All,

I am new to Solr, and started to look at our existing Solr implementation.
We are
We have a Solr 3.6 implementation with an index size of 54 GB with a
configuration of 2 Master/3 Slaves. Each Machine has a RAM of 32 GB. Avg.
Size of document is 100KB, and avg number of documents being indexed is 3
Millions a day. We do have sorting on 1 Numeric field.

Issues, we are facing are sometimes Solr is unresponsive with OOM error,
and that need restart. And sometimes, our queries are not able to get the
result.
We do plan to upgrade to latest and greatest in 6 months to year time
frame, but till then want to reduce the pain.

Would appreciate any suggestions.

Here is the Solr config, without comments:





${solr.abortOnConfigurationError:true}


  

  
  
  
  
  

  
  
  
  

  

  /data/pxs3_data

  

false

10

32

1
1000
1



native

  

  

false
64
2

false


true



  
  1
  
  0



 false

  


  


  


  


  

1024









true

20


200




  

  



  
 solr rocks010
static firstSearcher warming query from
solrconfig.xml
  


false

2

  

  





  


  
 
   explicit

 
  

  
  


  explicit

  



  http://hbpxsolrm03:8080/pxs3/replication
  00:05:00




  

 dismax
 explicit
 0.01
 
text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0 manu^1.1 cat^1.4
 
 
text^0.2 features^1.1 name^1.5 manu^1.4 manu_exact^1.9
 
 
popularity^0.5 recip(price,1,1000,1000)^0.3
 
 
id,name,price,score
 
 
2-1 5-2 690%
 
 100
 *:*
 
 text features name
 
 0
 
 name
 regex 

  


  

 dismax
 explicit
 text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0
 2-1 5-2 690%

 incubationdate_dt:[* TO NOW/DAY-1MONTH]^2.2



  inStock:true



  cat
  manu_exact
  price:[* TO 500]
  price:[500 TO *]

  


  

textSpell


  default
  name
  ./spellchecker



  

  
  

  
  false
  
  false
  
  1


  spellcheck

  

  
  
  

  true


  tvComponent

  


  


  
  default

  org.carrot2.clustering.lingo.LingoClusteringAlgorithm

  20


  stc
  org.carrot2.clustering.stc.STCClusteringAlgorithm

  
  
 
   true
   default
   true
   
   name
   id
   
   features
   
   true
   
   
   
   false


  clusteringComponent

  

  
  

  
  text
  true
  ignored_

  
  true
  links
  ignored_

  


  
  

  
 
  true


  termsComponent

  


  
  

string
elevate.xml
  

  
  

  explicit


  elevator

  



  


  


  


  


  
  


  

  
  

  standard
  solrpingquery
  all

  

  
  

 explicit 
 true

  

  
   
   
   

 100

   

   
   

  
  70
  
  0.5
  
  [-\w ,/\n\"']{20,200}

   

   
   

 
 

   
  


  
5
  

  
  
solr


  

 


   
   
  

velocity
browse
layout
Solritas

10
*,score


dismax
*:*

   text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0 manu^1.1 cat^1.4



on
cat
1


true
default
true

name
id

features

true



false

  

  
clusteringComponent
  
   

 

 ./dataimporthandler/data-config.xml

  




Re: SolrCloud - Fails to delete documents when some shard is down

2016-03-21 Thread Renaud Delbru

On 21/03/16 14:43, Erick Erickson wrote:

Hmmm, you say "where I have many shards and
can't have one problem  causing no deletion of old data.".

You then have a shard that, when it comes back up still
has all the old data and that _is_ acceptable? Seems like
that would be jarring to the users when some portion of the
docs in their collection reappeared...

But no, there's no similar option for update that I know of.
Solr tries very hard for consistency and this would lead
to an inconsistent data state.

What is the root cause of your shard going down? That's
the fundamental problem here...


As Erick said, what would be the cause to have a full shard and all its 
replicas going down at the same time ?
Usually, if a shard has multiple replicas, and one node goes down, the 
replicas on the other nodes should take the lead for this shard, and the 
delete queries should work.

--
Renaud Delbru



Best,
Erick

On Mon, Mar 21, 2016 at 7:08 AM, Tali Finelt  wrote:


Hi,

I am using Solr 4.10.2.

When one of the shards in my environment is down and fails to recover -
The process of deleting documents from other shards fails as well.

For example,
When running:
https://:8983/solr//update?stream.body=
*:*=true

I get the following error message:
No registered leader was found after waiting for 4000ms ,
collection:  slice:  

This causes a problem in a big environment where I have many shards and
can't have one problem  causing no deletion of old data.

Is there a way around that?

To Query on data in such cases, I use shards.tolerant=true parameter to
get results even if some shards are down.
Is there something similar for this case?

Thanks,
Tali








Re[2]: [nested] how to specify a path for multiple nesting?

2016-03-21 Thread Alisa Z .
 Thanks, Mikhail. 

I eventually added a distinguishing field "path" and queried unambiguously.  

>Четверг, 17 марта 2016, 9:46 -04:00 от Mikhail Khludnev 
>:
>
>Hello,
>
>Please find inline
>
>On Wed, Mar 16, 2016 at 10:10 PM, Alisa Z.  < prol...@mail.ru > wrote:
>> Hi all,
>>I have a deeply multi-level data structure (up to 6-7 levels deep) where due 
>>to the nature of the data some nested documents can have same type names at 
>>various levels. How to form a proper query on a nested field that would 
>>contain "a path"  that defines that field?
>>
>>I'll clarify with an example:
>>
>>Reduced dataset:
>>
>>[
>> {
>>    id : book1,
>>    type_s:book,
>>    title_t : "The Way of Kings",
>>    author_s : "Brandon Sanderson",
>>    _childDocuments_ : [
>>    {
>> id: book1_c1,
>>    type_s:body,
>>    text_t:"body text of the book... ",
>>    _childDocuments_:[
>>    {id: book2_c1_e1,
>>    type_s:"keywords",
>>    text_t:["The Matrix", "Neo", "character", "somebody", ...]}
>>    ]
>>    },
>>    { id: book1_c2,
>>    type_s:title,
>>    text_t:"This book was too long.",
>>    _childDocuments_:[
>>    {id: book2_c1_e1,
>>    type_s:"keywords",
>>    text_t:["The Matrix", "Neo"]}
>>    ]
>>  }
>>    ]
>> },
>> ...
>>]
>>
>>So there are different paths to text_t field:
>>*  book.body.keywords.text_t
>>*  book.title.keywords.text_t
>>I need to write a query that returns, say, all  books which have  keyword 
>>"Neo"  in their  title  (not body). 
>>I tried :
>>
>>(1)  q={!parent which=type_s:book}type_s:keywords AND text_t:Neo
>>which is obviously incorrect (returns both books whose body keywords and 
>>title keywords contain Neo):
>>
>>(2) q={!parent which=type_s:book}type_s:body^=0{!parent 
>>which=type_s:body}type_s:keywords AND text_t:Neo
>
>I'd say this might work, however I prefer to use v=$foo to break query 
>unambiguously. And also  
>https://lucidworks.com/blog/2011/12/28/why-not-and-or-and-not/ but make sure 
>that + is encoded as %2B in url.
>
>q={!parent which=type_s:book v=$titles}=+type_s:title^=0 +{!parent 
>which='type_s:(body title book)' v=$keywords}=+type_s:keywords^=0 
>+text_t:Neo
>
>specifying all sibling scopes discriminators is a black magic of block join 
>(if it ever works). Please get back with parsed query (from debugQuery=true) 
>and actual/expected result. Anyway, explicitly resolving scopes 
>(type_s:body_keywords, type_s:title_keywords) might be much maintainable. 
>
>  which does not return correct results (and I am not quite sure what it 
>really does, I just saw it in another thread of this mailing list)
>>
>>Can you help me to understand whether it is possible?
>>Or do I have to give unique types for documents at different levels of 
>>nesting (e.g., type_s:body_keywords & type_s:title_keywords)? I am trying to 
>>avoid, finding a way to specify a path would be much much more preferable. 
>>
>>
>>Thank you in advance and looking forward to hearing from you
>>--
>>Alisa Zhila
>
>
>-- 
>Sincerely yours
>Mikhail Khludnev
>Principal Engineer,
>Grid Dynamics
>
>
>



Seasonal searches in SOLR 5.x

2016-03-21 Thread Ioannis Kirmitzoglou
Hi all,

I would like to implement seasonal date searches on date ranges. I’m using SOLR 
5.4.1 and have indexed date ranges using a DateRangeField (let’s call this 
field date_ranges). 
Each document in SOLR corresponds to a biological sample and each sample was 
collected during a date range that can span from a single day to multiple 
years. For my application it makes sense to enable seasonal searches, ie find 
samples that were collected during a specific period of the year (e.g. summer, 
or February). In this type of search, the year that the sample was collected is 
not relevant, only the days of the year. I’ve been all over SOLR documentation 
and I haven’t been able to find anything that will enable do me that. The 
closest I got was a post with instructions on how to use a spatial field to do 
date searches 
(https://people.apache.org/~hossman/spatial-for-non-spatial-meetup-20130117/). 
Using the logic in that post I was able to come up with a solution but it’s 
rather complex and needs polygon searches (which in turn means installing the 
JTS Topology suite).
Before committing to that I would like to ask for your input and whether 
there’s an easier way to do these types of searches.

Many thanks,

Ioannis

-
Ioannis Kirmitzoglou, PhD
Bioinformatician - Scientific Programmer 
Imperial College, London
www.vectorbase.org
www.vigilab.org



Re: Save Number of words in field

2016-03-21 Thread Ahmet Arslan
Hi Rajesh,

The number of words are already stored (docValues) in the index. 
If you don't use index time boost, length of the field can be restored (with 
some precision loss) from field length norm.
http://lucene.apache.org/core/5_5_0/core/org/apache/lucene/search/similarities/TFIDFSimilarity.html
But i am not sure how you can retrieve that info in the result list.

Ahmet



On Monday, March 21, 2016 6:15 PM, "G, Rajesh"  wrote:
Hi,

When indexing sentences I want to store the number of words in the sentence in 
a fields that I can use to with other query later for word count match. Please 
let me know whether it is possible?

Thanks
Rajesh



Corporate Executive Board India Private Limited. Registration No: 
U741040HR2004PTC035324. Registered office: 6th Floor, Tower B, DLF Building 
No.10 DLF Cyber City, Gurgaon, Haryana-122002, India..



This e-mail and/or its attachments are intended only for the use of the 
addressee(s) and may contain confidential and legally privileged information 
belonging to CEB and/or its subsidiaries, including CEB subsidiaries that offer 
SHL Talent Measurement products and services. If you have received this e-mail 
in error, please notify the sender and immediately, destroy all copies of this 
email and its attachments. The publication, copying, in whole or in part, or 
use or dissemination in any other way of this e-mail and attachments by anyone 
other than the intended person(s) is prohibited.


Re: BCE dates on solr TrieDateField

2016-03-21 Thread Chris Hostetter

BCE dates have historically been problematic because of ambiguity in both  
the ISO format that we use for representing dates as well as the internal 
java representation, more details...

https://issues.apache.org/jira/browse/SOLR-1899

..the best work around I can suggest is to use simple numeric fields to 
represent your dates -- either as millis since whatever epoch you want, or 
as distinct year, month, day fields.


: Date: Mon, 21 Mar 2016 12:53:50 -0400
: From: jude mwenda 
: Reply-To: solr-user@lucene.apache.org
: To: solr-user@lucene.apache.org
: Subject: BCE dates on solr TrieDateField
: 
: Hey,
: 
: I hope this email finds you well. I have a solr.TrieDateField and I am
: trying to send -ve dates to this field. Does the TrieDateField allow for
: -ve dates? when I push the date -1600-01-10 to solr i get 1600-01-10 as the
: date registered. Please advise.
: 
: -- 
: Regards,
: 
: Jude Mwenda
: 

-Hoss
http://www.lucidworks.com/


Re: How fast indexing?

2016-03-21 Thread fabigol
Erick,
in fact, i looked % of cpu. The % often changed but sometime it was very low
(<10%) but the memory was heavy. I think what there was a problem and i cut
the indexation.
I don't index  documents but some data of postgres database.
If i decrease the number of fields, do you win time?
is it normal if the memory is full with 10go?

what are the right values for the parameter 'auto commit"?
 




--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-fast-indexing-tp4264994p4265125.html
Sent from the Solr - User mailing list archive at Nabble.com.


BCE dates on solr TrieDateField

2016-03-21 Thread jude mwenda
Hey,

I hope this email finds you well. I have a solr.TrieDateField and I am
trying to send -ve dates to this field. Does the TrieDateField allow for
-ve dates? when I push the date -1600-01-10 to solr i get 1600-01-10 as the
date registered. Please advise.

-- 
Regards,

Jude Mwenda


Re: How fast indexing?

2016-03-21 Thread fabigol
Amit Jha,
do you have several sold server with solr cloud?




--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-fast-indexing-tp4264994p4265122.html
Sent from the Solr - User mailing list archive at Nabble.com.


Save Number of words in field

2016-03-21 Thread G, Rajesh
Hi,

When indexing sentences I want to store the number of words in the sentence in 
a fields that I can use to with other query later for word count match. Please 
let me know whether it is possible?

Thanks
Rajesh



Corporate Executive Board India Private Limited. Registration No: 
U741040HR2004PTC035324. Registered office: 6th Floor, Tower B, DLF Building 
No.10 DLF Cyber City, Gurgaon, Haryana-122002, India..



This e-mail and/or its attachments are intended only for the use of the 
addressee(s) and may contain confidential and legally privileged information 
belonging to CEB and/or its subsidiaries, including CEB subsidiaries that offer 
SHL Talent Measurement products and services. If you have received this e-mail 
in error, please notify the sender and immediately, destroy all copies of this 
email and its attachments. The publication, copying, in whole or in part, or 
use or dissemination in any other way of this e-mail and attachments by anyone 
other than the intended person(s) is prohibited.




solr

2016-03-21 Thread jude mwenda
Hey,

I hope this email finds you well. I have a solr.TrieDateField and I am
trying to send -ve dates to this fields. Does the TrieDateField allow for
-ve dates? when I push the date -1600-01-10 to solr i get 1600-01-10 as the
date registered. Please advise.
-- 
Regards,

Jude Mwenda


Re: Indexing using CSV

2016-03-21 Thread Paul Hoffman
On Sun, Mar 20, 2016 at 06:11:32PM -0700, Jay Potharaju wrote:
> Hi,
> I am trying to index some data using csv files. The data contains
> description column, which can include quotes, comma, LF/CR & other special
> characters.
> 
> I have it working but run into an issue with the following error
> 
> line=5,can't read line: 5 values={NO LINES AVAILABLE}.
> 
> What is the best way to debug this issue and secondly how do other people
> handle indexing data using csv data.

I would concentrate first on getting the CSV reader working verifiably, 
which might be the hardest part -- CSV is not a file format, it's a 
hodgepodge.

Paul.

-- 
Paul Hoffman 
Systems Librarian
Fenway Libraries Online
c/o Wentworth Institute of Technology
550 Huntington Ave.
Boston, MA 02115
(617) 442-2384 (FLO main number)


Re: Deploy solr on glassfish

2016-03-21 Thread Daniel Collins
You have already asked this question, and there is a thread on going on
this?

To quote the previous thread, Solr is no longer a webapp that can be
deployed on any servlet container, it is now a black-box application, so
you should just deploy Solr as it is, and then connect to it yourself,
which you are already discussing in the other thread.

On 21 March 2016 at 11:16, Adel Mohamed Khalifa 
wrote:

> Hello All,
>
>
>
> I have a solr-5.3.0 and installed it and make my core but when I try to
> deploy it on glassfish and follow steps in
> "https://wiki.apache.org/solr/SolrGlassfish " it's recommended a solr.war
> file which isn't exactly found on my opt/solr/dist
>
>
>
> Where I can find it.
>
>
>
> Regards,
> Adel Khalifa
>
>
>
>


Re: Issue Running Solr

2016-03-21 Thread Erick Erickson
What does the Solr log say? That usually gives you a better
idea of what the root cause is, the script really doesn't have
access to the root cause.

Best,
Erick

On Mon, Mar 21, 2016 at 5:54 AM, Salman Ansari 
wrote:

> Hi,
>
> I am facing an issue in running Solr server. I tried different approaches
> and still receive the following error
>
> "ERROR: Solr at http://localhost:8983/solr did not come online within 30
> seconds"
>
> I tried running the following commands
>
> 1) solr -e cloud
> 2) solr.cmd start -cloud -p 8983 -s
> "C:\Solr\Solr-5.3.1\solr-5.3.1\example\cloud\node1" -h [myserver] -z
> "[server_ip]:2181,[server2_hostname]:2181,[server3_hostname]:2181"
>
> I tried running the commands multiple times but still get the same result.
>
> Are there any possible reasons why I am receiving this multiple times? Any
> possible solutions?
>
> Note: I tried this after a Windows update and restarting.
>
> Regards,
> Salman
>


Re: SolrCloud - Fails to delete documents when some shard is down

2016-03-21 Thread Erick Erickson
Hmmm, you say "where I have many shards and
can't have one problem  causing no deletion of old data.".

You then have a shard that, when it comes back up still
has all the old data and that _is_ acceptable? Seems like
that would be jarring to the users when some portion of the
docs in their collection reappeared...

But no, there's no similar option for update that I know of.
Solr tries very hard for consistency and this would lead
to an inconsistent data state.

What is the root cause of your shard going down? That's
the fundamental problem here...

Best,
Erick

On Mon, Mar 21, 2016 at 7:08 AM, Tali Finelt  wrote:

> Hi,
>
> I am using Solr 4.10.2.
>
> When one of the shards in my environment is down and fails to recover -
> The process of deleting documents from other shards fails as well.
>
> For example,
> When running:
> https://:8983/solr//update?stream.body=
> *:*=true
>
> I get the following error message:
> No registered leader was found after waiting for 4000ms ,
> collection:  slice:  
>
> This causes a problem in a big environment where I have many shards and
> can't have one problem  causing no deletion of old data.
>
> Is there a way around that?
>
> To Query on data in such cases, I use shards.tolerant=true parameter to
> get results even if some shards are down.
> Is there something similar for this case?
>
> Thanks,
> Tali
>
>
>
>


NPE from UpdateLog.lookup(UpdateLog.java:706) (Version 4.8.1)

2016-03-21 Thread Shay Sofer
Hi all,

During daily work I got NPE at mention line. full method below.
Anyone know if it's a known bug ? should I open a ticket for Solr ?

Thanks in advance, 
Shay.

public Object More ...lookup(BytesRef indexedId) {
683 LogPtr entry;
684 TransactionLog lookupLog;
685 
686 synchronized (this) {
687   entry = map.get(indexedId);
688   lookupLog = tlog;  // something found in "map" will always be in 
"tlog"
689   // SolrCore.verbose("TLOG: lookup: for id 
",indexedId.utf8ToString(),"in 
map",System.identityHashCode(map),"got",entry,"lookupLog=",lookupLog);
690   if (entry == null && prevMap != null) {
691 entry = prevMap.get(indexedId);
692 // something found in prevMap will always be found in preMapLog 
(which could be tlog or prevTlog)
693 lookupLog = prevMapLog;
694 // SolrCore.verbose("TLOG: lookup: for id 
",indexedId.utf8ToString(),"in 
prevMap",System.identityHashCode(map),"got",entry,"lookupLog=",lookupLog);
695   }
696   if (entry == null && prevMap2 != null) {
697 entry = prevMap2.get(indexedId);
698 // something found in prevMap2 will always be found in preMapLog2 
(which could be tlog or prevTlog)
699 lookupLog = prevMapLog2;
700 // SolrCore.verbose("TLOG: lookup: for id 
",indexedId.utf8ToString(),"in 
prevMap2",System.identityHashCode(map),"got",entry,"lookupLog=",lookupLog);
701   }
702 
703   if (entry == null) {
704 return null;
705   }
706   lookupLog.incref();
707 }
708 
709 try {
710   // now do the lookup outside of the sync block for concurrency
711   return lookupLog.lookup(entry.pointer);
712 } finally {
713   lookupLog.decref();
714 }
715 
716   }
717


Re: Explain score is different from score

2016-03-21 Thread Ahmet Arslan


Hi Alessandro,

OP have different ranking: fl=score and explain's score would have retrieve 
different orders.
I wrote test cases using ClassicSimilarity, but it won't re-produce.
This is really weird. I wonder what is triggering this.

aHmet 


On Monday, March 21, 2016 2:08 PM, Alessandro Benedetti  
wrote:



I would like to add a question, how the ranking is affected ?
Do you see the proper ranking in the explain section ?
And what about the results ? Are they ranked accordingly the correct score,
or they are ranked by the wrong score ?
I got a similar issue, which I am not able to reproduce yet, but it was
really really weird ( in my case I got also the ranking messed up_

Cheers


On Mon, Mar 21, 2016 at 7:30 AM, G, Rajesh  wrote:

> Hi Ahmet,
>
> I am using solr 5.5.0. I am running single instance with single core. No
> shards
>
> I have added  to my schema
> as suggested by Rick Sullivan. Now the scores are same between explain and
> score field.
>
> But instead of previous results "Lync - Microsoft Office 365" and
> "Microsoft Office 365" I am getting
>
> {
> "title":"Office 365",
> "score":7.471676
> },
> {
>"title":"Office 365",
> "score":7.471676
> },
>
> If I try NGram title:(Microsoft Ofice 365)
>
> The scores are same for top 10 results even though they are differing by
> min of 3 characters. I have attached my schema.xml so it can help
>
> 
> Lync - Microsoft Office 365
> 52.056263
>   
> Microsoft Office 365
> 52.056263
>   
> Microsoft Office 365 1.0
> 52.056263
>   
> Microsoft Office 365 14.0
> 52.056263
>   
> Microsoft Office 365 14.3
> 52.056263
>   
> Microsoft Office 365 14.4
> 52.056263
>   
> Microsoft Office 365 14.5(Mac)
> 52.056263
>   
> Microsoft Office 365 15.0
> 52.056263
>   
> Microsoft Office 365 16.0
> 52.056263
>   
> Microsoft Office 365 4.0
> 52.056263
>   
> Microsoft Office 365 E4
> 52.056263
>   
> Microsoft Mail Protection Reports for Office 365
> 15.0
> 50.215454
>
> Thanks
> Rajesh
>
>
>
> Corporate Executive Board India Private Limited. Registration No:
> U741040HR2004PTC035324. Registered office: 6th Floor, Tower B, DLF Building
> No.10 DLF Cyber City, Gurgaon, Haryana-122002, India.
>
> This e-mail and/or its attachments are intended only for the use of the
> addressee(s) and may contain confidential and legally privileged
> information belonging to CEB and/or its subsidiaries, including CEB
> subsidiaries that offer SHL Talent Measurement products and services. If
> you have received this e-mail in error, please notify the sender and
> immediately, destroy all copies of this email and its attachments. The
> publication, copying, in whole or in part, or use or dissemination in any
> other way of this e-mail and attachments by anyone other than the intended
> person(s) is prohibited.
>
> -Original Message-
> From: Ahmet Arslan [mailto:iori...@yahoo.com]
> Sent: Sunday, March 20, 2016 2:10 AM
> To: solr-user@lucene.apache.org; G, Rajesh ;
> r...@ricksullivan.net
> Subject: Re: Explain score is different from score
>
> Hi Rick and Rajesh,
>
> I wasn't able re-produce this neither with lucene nor solr.
> What version of solr is this?
> Are you using a sharded request?
>
> @BeforeClass
> public static void beforeClass() throws Exception {
> initCore("solrconfig.xml", "schema.xml");
>
> assertU(adoc("id", "1722669", "title", "Lync - Microsoft Office 365"));
> assertU(adoc("id", "2043876", "title", "Microsoft Office 365"));
>
> assertU(commit());
>
> }
>
> /**
> * Checks whether fl=score equals to Explain's score */ @Test public void
> testExplain() throws Exception { SolrQueryRequest req =
> req(CommonParams.DEBUG_QUERY, "true", "indent", "true", "q",
> "title:(Microsoft Ofice 365)", CommonParams.FL, "id,title,score"); String
> response = h.query(req); System.out.println(response); }
>
> @Test
> public void testExplain() throws Exception {
>
> Analyzer analyzer = new WhitespaceAnalyzer();
>
> Directory directory = new RAMDirectory();
>
> IndexWriterConfig config = new IndexWriterConfig(analyzer);
> config.setSimilarity(new ClassicSimilarity()); IndexWriter iwriter = new
> IndexWriter(directory, config);
>
> Document doc = new Document();
> doc.add(new Field("id", "1722669", TextField.TYPE_STORED)); doc.add(new
> Field("title", "Lync - Microsoft Office 365", TextField.TYPE_STORED));
> iwriter.addDocument(doc);
>
> doc = new Document();
> doc.add(new Field("id", "2043876", TextField.TYPE_STORED)); doc.add(new
> Field("title", "Microsoft Office 365", TextField.TYPE_STORED));
> iwriter.addDocument(doc);
>
>
> iwriter.close();
>
> // Now search the index:
> DirectoryReader reader = DirectoryReader.open(directory); IndexSearcher
> searcher = new IndexSearcher(reader); searcher.setSimilarity(new
> ClassicSimilarity());
>
> QueryParser parser = new QueryParser("title", 

SolrCloud - Fails to delete documents when some shard is down

2016-03-21 Thread Tali Finelt
Hi,

I am using Solr 4.10.2. 

When one of the shards in my environment is down and fails to recover - 
The process of deleting documents from other shards fails as well.

For example,
When running:
https://:8983/solr//update?stream.body=
*:*=true

I get the following error message:
No registered leader was found after waiting for 4000ms , 
collection:  slice:  

This causes a problem in a big environment where I have many shards and 
can't have one problem  causing no deletion of old data.

Is there a way around that?

To Query on data in such cases, I use shards.tolerant=true parameter to 
get results even if some shards are down.
Is there something similar for this case?

Thanks,
Tali





RE: Explain score is different from score

2016-03-21 Thread G, Rajesh
Please find my answer inline



Corporate Executive Board India Private Limited. Registration No: 
U741040HR2004PTC035324. Registered office: 6th Floor, Tower B, DLF Building 
No.10 DLF Cyber City, Gurgaon, Haryana-122002, India.

This e-mail and/or its attachments are intended only for the use of the 
addressee(s) and may contain confidential and legally privileged information 
belonging to CEB and/or its subsidiaries, including CEB subsidiaries that offer 
SHL Talent Measurement products and services. If you have received this e-mail 
in error, please notify the sender and immediately, destroy all copies of this 
email and its attachments. The publication, copying, in whole or in part, or 
use or dissemination in any other way of this e-mail and attachments by anyone 
other than the intended person(s) is prohibited.

-Original Message-
From: Alessandro Benedetti [mailto:abenede...@apache.org]
Sent: Monday, March 21, 2016 5:38 PM
To: solr-user@lucene.apache.org
Subject: Re: Explain score is different from score

I would like to add a question, how the ranking is affected ? the score 
difference was more than 1
Do you see the proper ranking in the explain section ? no it was different 
[score and explain score].
And what about the results ? Are they ranked accordingly the correct score, or 
they are ranked by the wrong score ? the result was ordered by score not by 
explain score. For my results to be correct it should have ordered by explain 
score instead of score
I got a similar issue, which I am not able to reproduce yet, but it was really 
really weird ( in my case I got also the ranking messed up_

Cheers

On Mon, Mar 21, 2016 at 7:30 AM, G, Rajesh  wrote:

> Hi Ahmet,
>
> I am using solr 5.5.0. I am running single instance with single core.
> No shards
>
> I have added  to my
> schema as suggested by Rick Sullivan. Now the scores are same between
> explain and score field.
>
> But instead of previous results "Lync - Microsoft Office 365" and
> "Microsoft Office 365" I am getting
>
> {
> "title":"Office 365",
> "score":7.471676
> },
> {
>"title":"Office 365",
> "score":7.471676
> },
>
> If I try NGram title:(Microsoft Ofice 365)
>
> The scores are same for top 10 results even though they are differing
> by min of 3 characters. I have attached my schema.xml so it can help
>
> 
> Lync - Microsoft Office 365
> 52.056263
>   
> Microsoft Office 365
> 52.056263
>   
> Microsoft Office 365 1.0
> 52.056263
>   
> Microsoft Office 365 14.0
> 52.056263
>   
> Microsoft Office 365 14.3
> 52.056263
>   
> Microsoft Office 365 14.4
> 52.056263
>   
> Microsoft Office 365 14.5(Mac)
> 52.056263
>   
> Microsoft Office 365 15.0
> 52.056263
>   
> Microsoft Office 365 16.0
> 52.056263
>   
> Microsoft Office 365 4.0
> 52.056263
>   
> Microsoft Office 365 E4
> 52.056263
>   
> Microsoft Mail Protection Reports for Office 365
> 15.0
> 50.215454
>
> Thanks
> Rajesh
>
>
>
> Corporate Executive Board India Private Limited. Registration No:
> U741040HR2004PTC035324. Registered office: 6th Floor, Tower B, DLF
> Building
> No.10 DLF Cyber City, Gurgaon, Haryana-122002, India.
>
> This e-mail and/or its attachments are intended only for the use of
> the
> addressee(s) and may contain confidential and legally privileged
> information belonging to CEB and/or its subsidiaries, including CEB
> subsidiaries that offer SHL Talent Measurement products and services.
> If you have received this e-mail in error, please notify the sender
> and immediately, destroy all copies of this email and its attachments.
> The publication, copying, in whole or in part, or use or dissemination
> in any other way of this e-mail and attachments by anyone other than
> the intended
> person(s) is prohibited.
>
> -Original Message-
> From: Ahmet Arslan [mailto:iori...@yahoo.com]
> Sent: Sunday, March 20, 2016 2:10 AM
> To: solr-user@lucene.apache.org; G, Rajesh ;
> r...@ricksullivan.net
> Subject: Re: Explain score is different from score
>
> Hi Rick and Rajesh,
>
> I wasn't able re-produce this neither with lucene nor solr.
> What version of solr is this?
> Are you using a sharded request?
>
> @BeforeClass
> public static void beforeClass() throws Exception {
> initCore("solrconfig.xml", "schema.xml");
>
> assertU(adoc("id", "1722669", "title", "Lync - Microsoft Office
> 365")); assertU(adoc("id", "2043876", "title", "Microsoft Office
> 365"));
>
> assertU(commit());
>
> }
>
> /**
> * Checks whether fl=score equals to Explain's score */ @Test public
> void
> testExplain() throws Exception { SolrQueryRequest req =
> req(CommonParams.DEBUG_QUERY, "true", "indent", "true", "q",
> "title:(Microsoft Ofice 365)", CommonParams.FL, "id,title,score");
> String response = h.query(req); System.out.println(response); }
>
> @Test
> public void testExplain() throws Exception {
>
> 

Issue Running Solr

2016-03-21 Thread Salman Ansari
Hi,

I am facing an issue in running Solr server. I tried different approaches
and still receive the following error

"ERROR: Solr at http://localhost:8983/solr did not come online within 30
seconds"

I tried running the following commands

1) solr -e cloud
2) solr.cmd start -cloud -p 8983 -s
"C:\Solr\Solr-5.3.1\solr-5.3.1\example\cloud\node1" -h [myserver] -z
"[server_ip]:2181,[server2_hostname]:2181,[server3_hostname]:2181"

I tried running the commands multiple times but still get the same result.

Are there any possible reasons why I am receiving this multiple times? Any
possible solutions?

Note: I tried this after a Windows update and restarting.

Regards,
Salman


Re: Explain score is different from score

2016-03-21 Thread Alessandro Benedetti
I would like to add a question, how the ranking is affected ?
Do you see the proper ranking in the explain section ?
And what about the results ? Are they ranked accordingly the correct score,
or they are ranked by the wrong score ?
I got a similar issue, which I am not able to reproduce yet, but it was
really really weird ( in my case I got also the ranking messed up_

Cheers

On Mon, Mar 21, 2016 at 7:30 AM, G, Rajesh  wrote:

> Hi Ahmet,
>
> I am using solr 5.5.0. I am running single instance with single core. No
> shards
>
> I have added  to my schema
> as suggested by Rick Sullivan. Now the scores are same between explain and
> score field.
>
> But instead of previous results "Lync - Microsoft Office 365" and
> "Microsoft Office 365" I am getting
>
> {
> "title":"Office 365",
> "score":7.471676
> },
> {
>"title":"Office 365",
> "score":7.471676
> },
>
> If I try NGram title:(Microsoft Ofice 365)
>
> The scores are same for top 10 results even though they are differing by
> min of 3 characters. I have attached my schema.xml so it can help
>
> 
> Lync - Microsoft Office 365
> 52.056263
>   
> Microsoft Office 365
> 52.056263
>   
> Microsoft Office 365 1.0
> 52.056263
>   
> Microsoft Office 365 14.0
> 52.056263
>   
> Microsoft Office 365 14.3
> 52.056263
>   
> Microsoft Office 365 14.4
> 52.056263
>   
> Microsoft Office 365 14.5(Mac)
> 52.056263
>   
> Microsoft Office 365 15.0
> 52.056263
>   
> Microsoft Office 365 16.0
> 52.056263
>   
> Microsoft Office 365 4.0
> 52.056263
>   
> Microsoft Office 365 E4
> 52.056263
>   
> Microsoft Mail Protection Reports for Office 365
> 15.0
> 50.215454
>
> Thanks
> Rajesh
>
>
>
> Corporate Executive Board India Private Limited. Registration No:
> U741040HR2004PTC035324. Registered office: 6th Floor, Tower B, DLF Building
> No.10 DLF Cyber City, Gurgaon, Haryana-122002, India.
>
> This e-mail and/or its attachments are intended only for the use of the
> addressee(s) and may contain confidential and legally privileged
> information belonging to CEB and/or its subsidiaries, including CEB
> subsidiaries that offer SHL Talent Measurement products and services. If
> you have received this e-mail in error, please notify the sender and
> immediately, destroy all copies of this email and its attachments. The
> publication, copying, in whole or in part, or use or dissemination in any
> other way of this e-mail and attachments by anyone other than the intended
> person(s) is prohibited.
>
> -Original Message-
> From: Ahmet Arslan [mailto:iori...@yahoo.com]
> Sent: Sunday, March 20, 2016 2:10 AM
> To: solr-user@lucene.apache.org; G, Rajesh ;
> r...@ricksullivan.net
> Subject: Re: Explain score is different from score
>
> Hi Rick and Rajesh,
>
> I wasn't able re-produce this neither with lucene nor solr.
> What version of solr is this?
> Are you using a sharded request?
>
> @BeforeClass
> public static void beforeClass() throws Exception {
> initCore("solrconfig.xml", "schema.xml");
>
> assertU(adoc("id", "1722669", "title", "Lync - Microsoft Office 365"));
> assertU(adoc("id", "2043876", "title", "Microsoft Office 365"));
>
> assertU(commit());
>
> }
>
> /**
> * Checks whether fl=score equals to Explain's score */ @Test public void
> testExplain() throws Exception { SolrQueryRequest req =
> req(CommonParams.DEBUG_QUERY, "true", "indent", "true", "q",
> "title:(Microsoft Ofice 365)", CommonParams.FL, "id,title,score"); String
> response = h.query(req); System.out.println(response); }
>
> @Test
> public void testExplain() throws Exception {
>
> Analyzer analyzer = new WhitespaceAnalyzer();
>
> Directory directory = new RAMDirectory();
>
> IndexWriterConfig config = new IndexWriterConfig(analyzer);
> config.setSimilarity(new ClassicSimilarity()); IndexWriter iwriter = new
> IndexWriter(directory, config);
>
> Document doc = new Document();
> doc.add(new Field("id", "1722669", TextField.TYPE_STORED)); doc.add(new
> Field("title", "Lync - Microsoft Office 365", TextField.TYPE_STORED));
> iwriter.addDocument(doc);
>
> doc = new Document();
> doc.add(new Field("id", "2043876", TextField.TYPE_STORED)); doc.add(new
> Field("title", "Microsoft Office 365", TextField.TYPE_STORED));
> iwriter.addDocument(doc);
>
>
> iwriter.close();
>
> // Now search the index:
> DirectoryReader reader = DirectoryReader.open(directory); IndexSearcher
> searcher = new IndexSearcher(reader); searcher.setSimilarity(new
> ClassicSimilarity());
>
> QueryParser parser = new QueryParser("title", analyzer); Query query =
> parser.parse("Microsoft Ofice 365"); ScoreDoc[] hits =
> searcher.search(query, 10).scoreDocs;
>
> Assert.assertEquals(2, hits.length);
>
> // Iterate through the results:
> for (int i = 0; i < hits.length; i++) {
>
> Document hitDoc = searcher.doc(hits[i].doc); Explanation explanation =
> searcher.explain(query, 

Deploy solr on glassfish

2016-03-21 Thread Adel Mohamed Khalifa
Hello All,

 

I have a solr-5.3.0 and installed it and make my core but when I try to
deploy it on glassfish and follow steps in
"https://wiki.apache.org/solr/SolrGlassfish " it's recommended a solr.war
file which isn't exactly found on my opt/solr/dist

 

Where I can find it.

 

Regards,
Adel Khalifa 

 



Re: Boosts for relevancy (shopping products)

2016-03-21 Thread Alessandro Benedetti
Mmm maybe I didn't explain properly, all the fields you have in the index
for the products could be used to design features .
Of course my list was an example, but when processing clicks you should
first take in consideration all the features you can extract that should
affect your ranking algorithm.
If you can give a glimpse of your schema, we can help in giving you some
draft feature :)

Cheers

On Fri, Mar 18, 2016 at 4:57 PM, Robert Brown  wrote:

> Thanks, would be a great idea but unfortunately we don't have that sort of
> granularity of features.
>
> Can definitely use the category of clicked products though, sounds like a
> good enough start.
>
>
>
>
>
> On 03/18/2016 04:36 PM, Alessandro Benedetti wrote:
>
>> Actually if you are able to collect past ( or future signals) like clicks
>> or purchase, i would rather focus on the features of your products rather
>> than the products themselves.
>> What will happen is that you are going to be able rank in a better way
>> products based on how their feature should affect the score.
>> i.e.
>> after you trained your model you realize that people searching for
>> computer
>> gadgets are more likely to click and buy :
>> specific brands - apple compatible - low energy consumption - high user
>> rating  ect ect products
>>
>> At this point even new products that will arrive, which have that set of
>> features, are going to be boosted.
>> Even if you haven't seen them at all.
>>
>> Cheers
>>
>> On Fri, Mar 18, 2016 at 4:21 PM, Robert Brown 
>> wrote:
>>
>> It's also worth mentioning that our platform contains shopping products in
>>> every single category, and will be searched by absolutely anyone, via an
>>> API made available to various websites, some niche, some not.
>>>
>>> If those websites are category specific, ie, electrical goods, then we
>>> could boost on certain categories for a given website, but if they're
>>> also
>>> broad, is this even possible?
>>>
>>> I guess we could track individual users and build up search-histories to
>>> try and guide us, but I don't see many hits being made on repeat users.
>>>
>>> Recording clicks on products could also be used to boost individual
>>> products for specific keywords - I'm beginning to think this is actually
>>> our best hope?  e.g.  A multi-valued field containing keywords that
>>> resulted in a click on that product.
>>>
>>>
>>>
>>>
>>>
>>> On 03/18/2016 04:14 PM, Robert Brown wrote:
>>>
>>> That does sound rather useful!

 We currently have it set to 0.1



 On 03/18/2016 04:13 PM, Nick Vasilyev wrote:

 Tie does quite a bit, without it only the highest weighted field that
> has
> the term will be included in relevance score. Tie let's you include the
> other fields that match as well.
> On Mar 18, 2016 10:40 AM, "Robert Brown"  wrote:
>
> Thanks for the added input.
>
>> I'll certainly look into the machine learning aspect, will be good to
>> put
>> some basic knowledge I have into practice.
>>
>> I'd been led to believe the tie parameter didn't actually do a lot.
>> :-/
>>
>>
>>
>> On 03/18/2016 12:07 PM, Nick Vasilyev wrote:
>>
>> I work with a similar catalog; except our data is especially bad.
>> We've
>>
>>> found that several things helped:
>>>
>>> - Item level grouping (group same item sold by multiple vendors).
>>> Rank
>>> items with more vendors a bit higher.
>>> - Include a boost function for other attributes, such as an original
>>> image
>>> of the product
>>> - Rank items a bit higher if they have data from an external catalog
>>> like
>>> IceCat
>>> - For relevance and performance, we have several fields that we copy
>>> data
>>> into. High value fields get copied into a high weighted field, while
>>> lower
>>> value fields like description get copied into a lower weighted field.
>>> These
>>> fields are the backbone of our qf parameter, with other fields adding
>>> additional boost.
>>> - Play around with the tie parameter for edismax, we found that it
>>> makes
>>> quite a big difference.
>>>
>>> Hope this helps.
>>>
>>> On Fri, Mar 18, 2016 at 6:19 AM, Alessandro Benedetti <
>>> abenede...@apache.org
>>>
>>> wrote:
>>>
 In a relevancy problem I would repeat what my colleagues already
 pointed
 out :
 Data is key. We need to understand first of all our data before we
 can
 understand what is relevant and what is not.
 Once we specify a groundfloor which make sense ( and your basic
 approach
 +
 proper schema configuration as suggested + properly configured
 request
 handler , seems a good start to me ) .

 At this point if you are still not happy with the relevancy (i.e.

Re: How fast indexing?

2016-03-21 Thread fabigol
For the indexation i use DIH.
I find this link for solar Indexation.
With solar it is more quick?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-fast-indexing-tp4264994p4265050.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How fast indexing?

2016-03-21 Thread fabigol
for the jvm i have 8GO



--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-fast-indexing-tp4264994p4265047.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How fast indexing?

2016-03-21 Thread fabigol
thank for your response.
soir worked on a server with 6CPU and 10 Go Memory Ram
We receive some data evrytime. Now, we do indexing 2 times per day
the database contains 5 tables (18k row, 3times 300k row and one of 6
millions rows)
The indexation was done in 6 hours.
I didn't modify the original solrconfig.xml file
I'm going to do a incremental indexation.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-fast-indexing-tp4264994p4265046.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Explain score is different from score

2016-03-21 Thread G, Rajesh
Hi Ahmet,

I am using solr 5.5.0. I am running single instance with single core. No shards

I have added  to my schema as 
suggested by Rick Sullivan. Now the scores are same between explain and score 
field.

But instead of previous results "Lync - Microsoft Office 365" and "Microsoft 
Office 365" I am getting

{
"title":"Office 365",
"score":7.471676
},
{
   "title":"Office 365",
"score":7.471676
},

If I try NGram title:(Microsoft Ofice 365)

The scores are same for top 10 results even though they are differing by min of 
3 characters. I have attached my schema.xml so it can help


Lync - Microsoft Office 365
52.056263
  
Microsoft Office 365
52.056263
  
Microsoft Office 365 1.0
52.056263
  
Microsoft Office 365 14.0
52.056263
  
Microsoft Office 365 14.3
52.056263
  
Microsoft Office 365 14.4
52.056263
  
Microsoft Office 365 14.5(Mac)
52.056263
  
Microsoft Office 365 15.0
52.056263
  
Microsoft Office 365 16.0
52.056263
  
Microsoft Office 365 4.0
52.056263
  
Microsoft Office 365 E4
52.056263
  
Microsoft Mail Protection Reports for Office 365 
15.0
50.215454

Thanks
Rajesh



Corporate Executive Board India Private Limited. Registration No: 
U741040HR2004PTC035324. Registered office: 6th Floor, Tower B, DLF Building 
No.10 DLF Cyber City, Gurgaon, Haryana-122002, India.

This e-mail and/or its attachments are intended only for the use of the 
addressee(s) and may contain confidential and legally privileged information 
belonging to CEB and/or its subsidiaries, including CEB subsidiaries that offer 
SHL Talent Measurement products and services. If you have received this e-mail 
in error, please notify the sender and immediately, destroy all copies of this 
email and its attachments. The publication, copying, in whole or in part, or 
use or dissemination in any other way of this e-mail and attachments by anyone 
other than the intended person(s) is prohibited.

-Original Message-
From: Ahmet Arslan [mailto:iori...@yahoo.com]
Sent: Sunday, March 20, 2016 2:10 AM
To: solr-user@lucene.apache.org; G, Rajesh ; 
r...@ricksullivan.net
Subject: Re: Explain score is different from score

Hi Rick and Rajesh,

I wasn't able re-produce this neither with lucene nor solr.
What version of solr is this?
Are you using a sharded request?

@BeforeClass
public static void beforeClass() throws Exception { initCore("solrconfig.xml", 
"schema.xml");

assertU(adoc("id", "1722669", "title", "Lync - Microsoft Office 365")); 
assertU(adoc("id", "2043876", "title", "Microsoft Office 365"));

assertU(commit());

}

/**
* Checks whether fl=score equals to Explain's score */ @Test public void 
testExplain() throws Exception { SolrQueryRequest req = 
req(CommonParams.DEBUG_QUERY, "true", "indent", "true", "q", "title:(Microsoft 
Ofice 365)", CommonParams.FL, "id,title,score"); String response = 
h.query(req); System.out.println(response); }

@Test
public void testExplain() throws Exception {

Analyzer analyzer = new WhitespaceAnalyzer();

Directory directory = new RAMDirectory();

IndexWriterConfig config = new IndexWriterConfig(analyzer); 
config.setSimilarity(new ClassicSimilarity()); IndexWriter iwriter = new 
IndexWriter(directory, config);

Document doc = new Document();
doc.add(new Field("id", "1722669", TextField.TYPE_STORED)); doc.add(new 
Field("title", "Lync - Microsoft Office 365", TextField.TYPE_STORED)); 
iwriter.addDocument(doc);

doc = new Document();
doc.add(new Field("id", "2043876", TextField.TYPE_STORED)); doc.add(new 
Field("title", "Microsoft Office 365", TextField.TYPE_STORED)); 
iwriter.addDocument(doc);


iwriter.close();

// Now search the index:
DirectoryReader reader = DirectoryReader.open(directory); IndexSearcher 
searcher = new IndexSearcher(reader); searcher.setSimilarity(new 
ClassicSimilarity());

QueryParser parser = new QueryParser("title", analyzer); Query query = 
parser.parse("Microsoft Ofice 365"); ScoreDoc[] hits = searcher.search(query, 
10).scoreDocs;

Assert.assertEquals(2, hits.length);

// Iterate through the results:
for (int i = 0; i < hits.length; i++) {

Document hitDoc = searcher.doc(hits[i].doc); Explanation explanation = 
searcher.explain(query, hits[i].doc);

Assert.assertEquals("score from explain should equal to ScoreDoc.score!", 
hits[i].score, explanation.getValue(), 0.0);

}


reader.close();
directory.close();

}





On Saturday, March 19, 2016 7:54 AM, "G, Rajesh"  wrote:
I don’t use boost at index time and query time.



Corporate Executive Board India Private Limited. Registration No: 
U741040HR2004PTC035324. Registered office: 6th Floor, Tower B, DLF Building 
No.10 DLF Cyber City, Gurgaon, Haryana-122002, India.

This e-mail and/or its attachments are intended only for the use of the 
addressee(s) and may contain confidential and legally privileged information 
belonging to CEB and/or its 

Sold 5.3 certificate client authentication

2016-03-21 Thread Abdul Rahim, Muzammil (Nokia - IN/Bangalore)

Hi,

My basic requirement is to secure the solr admin page. I thought of securing it 
using certificate authentication. So I followed the below link

https://cwiki.apache.org/confluence/display/solr/Enabling+SSL

With the help of above link I was able to secure my admin page but I am using 
tomcat server where my application runs which will put and retrieve data to 
Solr. I tried importing the certificate to truststore of tomcat but since I had 
given SOLR_SSL_NEED_CLIENT_AUTH=true my call to solr was not success. Can you 
help me with some suggestion to address this.

With regards
Muzammil A