crawling all links of same domain in nutch in solr

2014-07-28 Thread Vivekanand Ittigi
Hi,

Can anyone tel me how to crawl all other pages of same domain.
For example i'm feeding a website http://www.techcrunch.com/ in seed.txt.

Following property is added in nutch-site.xml


  db.ignore.internal.links
  false
  If true, when adding new links to a page, links from
  the same host are ignored.  This is an effective way to limit the
  size of the link database, keeping only the highest quality
  links.
  


And following is added in regex-urlfilter.txt

# accept anything else
+.

Note: if i add http://www.tutorialspoint.com/ in seed.txt, I'm able to
crawl all other pages but not techcrunch.com's pages though it has got many
other pages too.

Please help..?

Thanks,
Vivek


Getting Started with Enterprise Search using Apache Solr

2014-07-28 Thread Xavier Morera
Hi. Most of the members here are already seasoned search professionals.
However I believe there may also be a few who joined because they want to
get started on search and IMHO, probably like you, Solr is the best way to
start.


Therefore I wanted to post a link to a course that I created on Getting
Started Enterprise Search using Apache Solr. For some it might be a good
way to start learning. If you are already a search professional maybe you
will not benefit greatly, but if you can provide feedback that will be
great as I want to create more trainings to help people get started on
search.

It is a Pluralsight training so if you are not a subscriber, just create a
trial account and you have 10 days to watch. If you have questions, let me
know. You can reach me through here or @xmorera in Twitter

Here is the course
http://pluralsight.com/training/Courses/TableOfContents/enterprise-search-using-apache-solr


PS: Pluralsight is also a great way to learn so I really recommend it.


Getting Started with Enterprise Search using Apache Solr pluralsight.com

Search is one of the most misunderstood functionalities in the IT industry.
Even further, Enterprise Search used to be neither for the faint of heart,
nor for those with a thin wallet. However, since the introduction of Apache
Solr, the name of the game has changed. Don't leave home without it!

-- 
*Xavier Morera*
email: xav...@familiamorera.com
CR: +(506) 8849 8866
US: +1 (305) 600 4919
skype: xmorera


Re: how to extract stats component with solrj 4.9.0

2014-07-28 Thread Edith Au
Thanks Shawn.


I found this method FieldStatsInfo().getFacets() in the Solr 4.9.0 doc.
 But it seems to me the method is missing in my Solrj 4.9.0 distribution.
 Could this be a bug?  or I have a bad distro?






On Mon, Jul 28, 2014 at 9:43 AM, Shawn Heisey  wrote:

> On 7/28/2014 10:08 AM, Edith Au wrote:
> > I tried getFieldStatsInfo().  I got a stats of the stats with this method
> > (ie. sum (sum(count)) of all the group'ed results.  But it is not what I
> > want.  I want a list of stats (ie. sum(count), group by block num).
>  With a
> > debugger, I could see the information I want in this private object
> > response._statsInfo.
> >
> > I could grab the information I want with response.toString().  That's
> what
> > I did and parse the string myself for now. :(
>
> Edith,
>
> Everything that's in the QueryResponse is available as a Java object,
> with getHeader and getResponse being the gateways.  The NamedList object
> type that these methods return is a very compact and useful structure,
> created specifically for Solr.  Once you have a NamedList, the data that
> you want might be buried, so the findRecursive method (available with
> SolrJ 4.4.0 and later) can be very useful to navigate the object easily.
>
> Thanks,
> Shawn
>
>


Re: how to extract stats component with solrj 4.9.0

2014-07-28 Thread Shawn Heisey
On 7/28/2014 10:08 AM, Edith Au wrote:
> I tried getFieldStatsInfo().  I got a stats of the stats with this method
> (ie. sum (sum(count)) of all the group'ed results.  But it is not what I
> want.  I want a list of stats (ie. sum(count), group by block num).  With a
> debugger, I could see the information I want in this private object
> response._statsInfo.
>
> I could grab the information I want with response.toString().  That's what
> I did and parse the string myself for now. :(

Edith,

Everything that's in the QueryResponse is available as a Java object,
with getHeader and getResponse being the gateways.  The NamedList object
type that these methods return is a very compact and useful structure,
created specifically for Solr.  Once you have a NamedList, the data that
you want might be buried, so the findRecursive method (available with
SolrJ 4.4.0 and later) can be very useful to navigate the object easily.

Thanks,
Shawn



Re: how to extract stats component with solrj 4.9.0

2014-07-28 Thread Edith Au
I tried getFieldStatsInfo().  I got a stats of the stats with this method
(ie. sum (sum(count)) of all the group'ed results.  But it is not what I
want.  I want a list of stats (ie. sum(count), group by block num).  With a
debugger, I could see the information I want in this private object
response._statsInfo.

I could grab the information I want with response.toString().  That's what
I did and parse the string myself for now. :(




On Sun, Jul 27, 2014 at 1:41 PM, Erick Erickson 
wrote:

> Have you tried the getFieldStatsInfo method in the QueryResponse object?
>
> Best,
> Erick
>
>
> On Sat, Jul 26, 2014 at 3:36 PM, Edith Au  wrote:
>
> > I have a solr query like this
> >
> > q=categories:cat1 OR
> > categories:cat2&stats=true&stats.field=count&stats.facet=block_num
> >
> > Basically, I want to get the sum(count) group by block num.
> >
> >
> > This query works on a browser. But with solrj, I could not access the
> stats
> > fields from the Response obj. I can do a response.getFieldStatsInfo().
> But
> > it is not what I want. Here is how I construct the query
> >
> > SolrQuery query = new SolrQuery(q);
> > query.add("stats", "true");
> > query.add("stats.field", "count");
> > query.add("stats.facet", "block_num");
> >
> > With a debugger, I could see that the response has a private statsInfo
> > object and it has the information I am looking for. But there is no api
> to
> > access the private object.
> >
> > I would like to know if there is
> >
> >1. a better way to construct my query. I only need the sum of (count),
> >group by block num
> >2. a way to access the hidden statsInfo object in the query
> response()?
> >[it is so frustrated. I can see all the info I need in the private obj
> > on
> >my debugger!]
> >
> > Thanks!
> >
> >
> > ps. I posted this question on stackoverflow but have gotten no response
> so
> > far.  Any help will be greatly appreciated!
> >
> > Thanks!
> >
>


Re: To warm the whole cache of Solr other than the only autowarmcount

2014-07-28 Thread Erick Erickson
bq: autowarmcount=1024...

That's the point, this is quite a high number in my
experience.

I've rarely seen numbers above 128 show much of
any improvement. I've seen a large number of
installations use much smaller autowarm numbers,
as in the 16-32 range and be quite content.

I _really_ recommend you try to use much smaller
numbers then _measure_ whether the first few
queries after a commit show unacceptable
response times before trying to make things
"better". This really feels like premature
optimization.

Of course you know your problem space better than
I do, it's just that I've spent too much of my
professional life fixing the wrong "problem"; I've
become something of a "measure first" curmudgeon.

FWIW,
Erick


On Sun, Jul 27, 2014 at 10:48 PM, YouPeng Yang 
wrote:

> Hi Erick
>
> We do the DIH job from the DB and committed frequently.It takes a long time
> to autowarm the filterCaches after commit or soft commit  happened when
> setting the autowarmcount=1024,which I do think is small enough.
> So It comes up an idea that whether it  could  directly pass the reference
> of the caches   over to the new caches so that the autowarm processing will
> take much fewer time .
>
>
>
> 2014-07-28 2:30 GMT+08:00 Erick Erickson :
>
> > Why do you think you _need_ to autowarm the entire cache? It
> > is, after all, an LRU cache, the theory being that the most recent
> > queries are most likely to be reused.
> >
> > Personally I'd run some tests on using small autowarm counts
> > before getting at all mixed up in some complex scheme that
> > may not be useful at all. Say an autowarm count of 16. Then
> > measure using that, then say 32 then... Insure you have a real
> > problem before worrying about a solution! ;)
> >
> > Best,
> > Erick
> >
> >
> > On Fri, Jul 25, 2014 at 6:45 AM, Shawn Heisey  wrote:
> >
> > > On 7/24/2014 8:45 PM, YouPeng Yang wrote:
> > > > To Matt
> > > >
> > > >   Thank you,your opinion is very valuable ,So I have checked the
> source
> > > > codes about how the cache warming  up. It seems to just put items of
> > the
> > > > old caches into the new caches.
> > > >   I will pull Mark Miller into this discussion.He is the one of the
> > > > developer of the Solr whom  I had  contacted with.
> > > >
> > > >  To Mark Miller
> > > >
> > > >Would you please check out what we are discussing in the last two
> > > > posts.I need your help.
> > >
> > > Matt is completely right.  Any commit can drastically change the Lucene
> > > document id numbers.  It would be too expensive to determine which
> > > numbers haven't changed.  That means Solr must throw away all cache
> > > information on commit.
> > >
> > > Two of Solr's caches support autowarming.  Those caches use queries as
> > > keys and results as values.  Autowarming works by re-executing the top
> N
> > > queries (keys) in the old cache to obtain fresh Lucene document id
> > > numbers (values).  The cache code does take *keys* from the old cache
> > > for the new cache, but not *values*.  I'm very sure about this, as I
> > > wrote the current (and not terribly good) LFUCache.
> > >
> > > Thanks,
> > > Shawn
> > >
> > >
> >
>


Re: Query about vacuum full

2014-07-28 Thread Ameya Aware
yes..

i intended to post this query there.

By mistake, i put it here.

Apologizing

Ameya


On Mon, Jul 28, 2014 at 11:07 AM, Jack Krupansky 
wrote:

> Or are you using ManifoldCF?
>
> -- Jack Krupansky
>
> -Original Message- From: Rafał Kuć
> Sent: Monday, July 28, 2014 11:00 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Query about vacuum full
>
> Hello!
>
> Please refer to PostgreSQL mailing list with this question. This
> question is purely about that database and this mailing list is about
> Solr.
>
> --
> Regards,
> Rafał Kuć
> Performance Monitoring * Log Analytics * Search Analytics
> Solr & Elasticsearch Support * http://sematext.com/
>
>
>  Hi,
>>
>
>  I am seeing considerable decrease in speed of indexing of documents.
>>
>
>  I am using PostgreSQL.
>>
>
>  So is this a right time to do vacuum on PostgreSQL because i am using this
>> since a week.
>>
>
>
>  Also, to invoke vacuum full do i just need to go to PostgreSQL command
>> prompt and invoke "VACUUM FULL" command?
>>
>
>
>  Thanks,
>> Ameya
>>
>
>


Re: Query about vacuum full

2014-07-28 Thread Jack Krupansky

Or are you using ManifoldCF?

-- Jack Krupansky

-Original Message- 
From: Rafał Kuć

Sent: Monday, July 28, 2014 11:00 AM
To: solr-user@lucene.apache.org
Subject: Re: Query about vacuum full

Hello!

Please refer to PostgreSQL mailing list with this question. This
question is purely about that database and this mailing list is about
Solr.

--
Regards,
Rafał Kuć
Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/



Hi,



I am seeing considerable decrease in speed of indexing of documents.



I am using PostgreSQL.



So is this a right time to do vacuum on PostgreSQL because i am using this
since a week.




Also, to invoke vacuum full do i just need to go to PostgreSQL command
prompt and invoke "VACUUM FULL" command?




Thanks,
Ameya 




Re: Query about vacuum full

2014-07-28 Thread Rafał Kuć
Hello!

Please refer to PostgreSQL mailing list with this question. This
question is purely about that database and this mailing list is about
Solr.

-- 
Regards,
 Rafał Kuć
Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/


> Hi,

> I am seeing considerable decrease in speed of indexing of documents.

> I am using PostgreSQL.

> So is this a right time to do vacuum on PostgreSQL because i am using this
> since a week.


> Also, to invoke vacuum full do i just need to go to PostgreSQL command
> prompt and invoke "VACUUM FULL" command?


> Thanks,
> Ameya



Re: Understanding the Debug explanations for Query Result Scoring/Ranking

2014-07-28 Thread O. Olson
Thank you very much Chris. I was not aware of debug.explain.structured. It
seems to be what I was looking for. 

Thanks also to Jack Krupansky. Yes, delving into those numbers would be my
next step, but I will get to that later.
O. O.


Chris Hostetter-3 wrote
> Just to be clear, regardless of *which* response writer you use (xml, 
> ruby, json, etc...) the default behavior is to include the score 
> explanation sa a single string which uses tabs/newlines to deal with the 
> nested (this nesting is visible if you view the raw response, no matter 
> what ResponseWriter)
> 
> You can however add a param indicating that you want the explaantion 
> information to be returned as a *structured data* instead o a simple 
> string...
> 
> https://wiki.apache.org/solr/CommonQueryParameters#debug.explain.structured
> 
> ...if you wnat to programatically process debug info, this is the 
> recomended way to to so.
> 
> -Hoss
> http://www.lucidworks.com/





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Understanding-the-Debug-explanations-for-Query-Result-Scoring-Ranking-tp4149137p4149521.html
Sent from the Solr - User mailing list archive at Nabble.com.


Query about vacuum full

2014-07-28 Thread Ameya Aware
Hi,

I am seeing considerable decrease in speed of indexing of documents.

I am using PostgreSQL.

So is this a right time to do vacuum on PostgreSQL because i am using this
since a week.


Also, to invoke vacuum full do i just need to go to PostgreSQL command
prompt and invoke "VACUUM FULL" command?


Thanks,
Ameya


Re: Bloom filter

2014-07-28 Thread Per Steffensen
Yes I found that one, along with SOLR-3950. Well at least it seems like 
the support is there in Lucene. I will figure out myself how to make it 
work via Solr, the way I need it to work. My use-case is not as 
specified in SOLR-1375, but the solution might be the same. Any input is 
of course still very much appreciated.


Regards, Per Steffensen

On 28/07/14 15:42, Lukas Drbal wrote:

Hi Per,

link to jira - https://issues.apache.org/jira/browse/SOLR-1375 Unresolved
;-)

L.


On Mon, Jul 28, 2014 at 1:17 PM, Per Steffensen  wrote:


Hi

Where can I find documentation on how to use Bloom filters in Solr (4.4).
http://wiki.apache.org/solr/BloomIndexComponent seems to be outdated -
there is no BloomIndexComponent included in 4.4 code.

Regards, Per Steffensen








Re: java.lang.OutOfMemoryError: Requested array size exceeds VM limit

2014-07-28 Thread Harald Kirsch

Hi,

the stack trace points to tika, which is likely in the process of 
extracting indexable plain text from some document.


Tika's job is one of the dirtiest you can think of in the whole indexing 
business. We throw all kinds of more or less 
documented/broken/misguided/ill-designed/cruft/truncated documents at it 
and want it to do miracles in understanding that stuff and getting the 
plain text out.


It does quite a good job most of the time, but sometimes it just gets 
trapped. There is nearly no chance to get Tika bug-free (whatever that 
means when the requirements are ill-defined), so we must live with 
accidents like you have here, where seemingly Tika reckoned that it 
needs a large amount of memory to parse a document.


There are two ways out:

a) You very strictly control whatever document enters Tika by using a 
white-list.
b) You don't let Tika run as part of Solr, but take it into a seperate 
process, let it crash and restart it automatically.


Regards,
Harald.



On 25.07.2014 19:32, Ameya Aware wrote:

Please find below entire stack trace:


ERROR - 2014-07-25 13:14:22.202; org.apache.solr.common.SolrException;
null:java.lang.RuntimeException: java.lang.OutOfMemoryError: Requested
array size exceeds VM limit
at
org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:790)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:439)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
at org.eclipse.jetty.server.Server.handle(Server.java:368)
at
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
at
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
at
org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942)
at
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:636)
at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
at
org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
at
org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
at
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
at java.lang.Thread.run(Unknown Source)
Caused by: java.lang.OutOfMemoryError: Requested array size exceeds VM limit
at java.util.Arrays.copyOf(Unknown Source)
at java.lang.AbstractStringBuilder.expandCapacity(Unknown Source)
at java.lang.AbstractStringBuilder.ensureCapacityInternal(Unknown Source)
at java.lang.AbstractStringBuilder.append(Unknown Source)
at java.lang.StringBuilder.append(Unknown Source)
at
org.apache.solr.handler.extraction.SolrContentHandler.characters(SolrContentHandler.java:303)
at
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
at
org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.java:270)
at
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
at
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
at
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
at
org.apache.tika.sax.SafeContentHandler.access$001(SafeContentHandler.java:46)
at
org.apache.tika.sax.SafeContentHandler$1.write(SafeContentHandler.java:82)
at
org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java:140)
at
org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.java:287)
at
org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:278)
at org.apache.tika.parser.txt.TXTParser.parse(TXTParser.java:88)
at or

Re: Bloom filter

2014-07-28 Thread Lukas Drbal
Hi Per,

link to jira - https://issues.apache.org/jira/browse/SOLR-1375 Unresolved
;-)

L.


On Mon, Jul 28, 2014 at 1:17 PM, Per Steffensen  wrote:

> Hi
>
> Where can I find documentation on how to use Bloom filters in Solr (4.4).
> http://wiki.apache.org/solr/BloomIndexComponent seems to be outdated -
> there is no BloomIndexComponent included in 4.4 code.
>
> Regards, Per Steffensen
>



-- 


*Lukáš Drbal*
Software architect

*Socialbakers*
Facebook applications and other sweet stuff

Facebook Preferred Marketing Developer


*+420 739 815 424**lukas.dr...@socialbakers.com
*
*www.socialbakers.com *


Re: /solr/admin/ping causing exceptions in log?

2014-07-28 Thread Nathan Neulinger

Thing is - I wouldn't expect any of the default options mentioned to change the 
behavior intermittently.

i.e. it's working for 95% of the health check requests, it's just the intermittent ones that seem to be cut off... I'm 
inquiring with haproxy devs since it appears that at least one other person on #haproxy is seeing the same behavior. 
Doesn't appear to be specific to solr.


-- Nathan

On 07/27/2014 10:44 PM, Shawn Heisey wrote:

On 7/27/2014 7:23 PM, Nathan Neulinger wrote:

Unfortunately, doesn't look like this clears the symptom.

The ping is responding almost instantly every time. I've tried setting a
15 second timeout on the check, with no change in occurences of the error.

Looking at a packet capture on the server side, there is a clear
distinction between working and failing/error-triggering connections.

It looks like in a "working" case, I see two packets immediately back to
back (one with header, and next a continuation with content) with no ack
in between, followed by ack, rst+ack, rst.

In the failing request, I see the GET request, acked, then the http/1.1
200 Ok response from Solr, a single ack, and then an almost
instantaneous reset sent by the client.


I'm only seeing this on traffic to/from haproxy checks. If I do a simple:

 while [ true ]; do curl -s http://host:8983/solr/admin/ping; done

from the same box, that flood runs with generally 10-20ms request times
and zero errors.


I won't claim to understand what's going on here, but it might be a
matter of the haproxy options.  Here are the options I'm using in the
"defaults" section of the config:

defaults
 log global
 modehttp
 option  httplog
 option  dontlognull
 option  redispatch
 option  abortonclose
 option  http-server-close
 option  http-pretend-keepalive
 retries 1
 maxconn 1024
 timeout connect 1s
 timeout client  5s
 timeout server  30s

One bit of information I came across when I first started setting
haproxy up for Solr is that servlet containers like Jetty and Tomcat
require the "http-pretend-keepalive" option to work properly.  Are you
using this option?

Thanks,
Shawn



--

Nathan Neulinger   nn...@neulinger.org
Neulinger Consulting   (573) 612-1412


Re: Bloom filter

2014-07-28 Thread Shalin Shekhar Mangar
I don't think that issue was ever committed.


On Mon, Jul 28, 2014 at 4:47 PM, Per Steffensen  wrote:

> Hi
>
> Where can I find documentation on how to use Bloom filters in Solr (4.4).
> http://wiki.apache.org/solr/BloomIndexComponent seems to be outdated -
> there is no BloomIndexComponent included in 4.4 code.
>
> Regards, Per Steffensen
>



-- 
Regards,
Shalin Shekhar Mangar.


Re: To warm the whole cache of Solr other than the only autowarmcount

2014-07-28 Thread Shawn Heisey
On 7/28/2014 12:06 AM, YouPeng Yang wrote:
>   No affense to your work,I am still confusing about the cache warm
> processing about your explanation.So I check the warm method of
> FastLRUCache as [1].
>   As far as I see,there is no values refresh during the the warm
> processing. the  *regenerator.regenerateItem* just put the old value to the
> new cache.

What the cache code does is pass the key and the value from the current
cache entry to the CacheRegenerator.regenerateItem method, which is
defined elsewhere when the cache is created.  I thought it was just the
key, but now that I look closer, it is both.  Exactly what is done with
that information is completely up to the regenerator object.  The cache
unit tests use a NoOpRegenerator, which simply populates the new cache
with the entire old entry, including the value.  That is not how things
work with the actual production caches, though.

With filterCache and queryResultCache, the key contains the query, and
the regenerator is set up to execute the query against the live index
and insert the new value in the new cache.  I've never looked deeply
into the regenerator code.

Thanks,
Shawn



RE: copy EnumField to text field

2014-07-28 Thread Elran Dvir
Do you think that I that the change I suggested In DocumentBuilder is right or 
should we leave it as it? 

The change:
Instead of:
// Perhaps trim the length of a copy field
 Object val = v;

The code will be:
// Perhaps trim the length of a copy field 
Object val = sfield.getType().toExternal(sfield.createField(v, 1.0f));

Thanks.

-Original Message-
From: Jack Krupansky [mailto:j...@basetechnology.com] 
Sent: Monday, July 28, 2014 3:24 PM
To: solr-user@lucene.apache.org
Subject: Re: copy EnumField to text field

Correct - copy field copies the raw, original, source input value, before the 
actual field type has had a chance to process it in any way.

-- Jack Krupansky

-Original Message-
From: Elran Dvir
Sent: Monday, July 28, 2014 8:08 AM
To: solr-user@lucene.apache.org
Subject: RE: copy EnumField to text field

So if I have a document without severity, I can't see severity has its default 
value (0) in the stage of copy fields (in class DocumentBuilder)?

Thanks.

-Original Message-
From: Jack Krupansky [mailto:j...@basetechnology.com]
Sent: Monday, July 28, 2014 2:39 PM
To: solr-user@lucene.apache.org
Subject: Re: copy EnumField to text field

There is a distinction between the original source input value for the indexing 
process and what value is actually indexed. Query searching will see whatever 
is actually indexed, not the original source input value. An URP could 
explicitly set the source input value to the default value if it is missing, 
but you would have to specify an explicit value for the URP to use.

-- Jack Krupansky

-Original Message-
From: Elran Dvir
Sent: Monday, July 28, 2014 4:12 AM
To: solr-user@lucene.apache.org
Subject: RE: copy EnumField to text field

Are you saying that default values are for query and not for indexing?

Thanks.

-Original Message-
From: Alexandre Rafalovitch [mailto:arafa...@gmail.com]
Sent: Monday, July 28, 2014 9:38 AM
To: solr-user
Subject: Re: copy EnumField to text field

On Mon, Jul 28, 2014 at 1:31 PM, Elran Dvir  wrote:
> But when no value is sent with severity, and the default of 0 is used, 
> the fix doesn't seem to work.

I guess the default in this case is figured out at the query time because there 
is no empty value as such. So that would be too late for copyField. If I am 
right, then you could probably use UpdateRequestProcessor and set the default 
value explicitly (DefaultValueUpdateProcessorFactory).

Regards,
   Alex.

Personal: http://www.outerthoughts.com/ and @arafalov Solr resources and
newsletter: http://www.solr-start.com/ and @solrstart Solr popularizers
community: https://www.linkedin.com/groups?gid=6713853

Email secured by Check Point


Email secured by Check Point 


Email secured by Check Point


Re: copy EnumField to text field

2014-07-28 Thread Jack Krupansky
Correct - copy field copies the raw, original, source input value, before 
the actual field type has had a chance to process it in any way.


-- Jack Krupansky

-Original Message- 
From: Elran Dvir

Sent: Monday, July 28, 2014 8:08 AM
To: solr-user@lucene.apache.org
Subject: RE: copy EnumField to text field

So if I have a document without severity, I can't see severity has its 
default value (0) in the stage of copy fields (in class DocumentBuilder)?


Thanks.

-Original Message-
From: Jack Krupansky [mailto:j...@basetechnology.com]
Sent: Monday, July 28, 2014 2:39 PM
To: solr-user@lucene.apache.org
Subject: Re: copy EnumField to text field

There is a distinction between the original source input value for the 
indexing process and what value is actually indexed. Query searching will 
see whatever is actually indexed, not the original source input value. An 
URP could explicitly set the source input value to the default value if it 
is missing, but you would have to specify an explicit value for the URP to 
use.


-- Jack Krupansky

-Original Message-
From: Elran Dvir
Sent: Monday, July 28, 2014 4:12 AM
To: solr-user@lucene.apache.org
Subject: RE: copy EnumField to text field

Are you saying that default values are for query and not for indexing?

Thanks.

-Original Message-
From: Alexandre Rafalovitch [mailto:arafa...@gmail.com]
Sent: Monday, July 28, 2014 9:38 AM
To: solr-user
Subject: Re: copy EnumField to text field

On Mon, Jul 28, 2014 at 1:31 PM, Elran Dvir  wrote:

But when no value is sent with severity, and the default of 0 is used,
the fix doesn't seem to work.


I guess the default in this case is figured out at the query time because 
there is no empty value as such. So that would be too late for copyField. If 
I am right, then you could probably use UpdateRequestProcessor and set the 
default value explicitly (DefaultValueUpdateProcessorFactory).


Regards,
  Alex.

Personal: http://www.outerthoughts.com/ and @arafalov Solr resources and
newsletter: http://www.solr-start.com/ and @solrstart Solr popularizers
community: https://www.linkedin.com/groups?gid=6713853

Email secured by Check Point


Email secured by Check Point 



RE: copy EnumField to text field

2014-07-28 Thread Elran Dvir
So if I have a document without severity, I can't see severity has its default 
value (0) in the stage of copy fields (in class DocumentBuilder)?

Thanks.

-Original Message-
From: Jack Krupansky [mailto:j...@basetechnology.com] 
Sent: Monday, July 28, 2014 2:39 PM
To: solr-user@lucene.apache.org
Subject: Re: copy EnumField to text field

There is a distinction between the original source input value for the indexing 
process and what value is actually indexed. Query searching will see whatever 
is actually indexed, not the original source input value. An URP could 
explicitly set the source input value to the default value if it is missing, 
but you would have to specify an explicit value for the URP to use.

-- Jack Krupansky

-Original Message-
From: Elran Dvir
Sent: Monday, July 28, 2014 4:12 AM
To: solr-user@lucene.apache.org
Subject: RE: copy EnumField to text field

Are you saying that default values are for query and not for indexing?

Thanks.

-Original Message-
From: Alexandre Rafalovitch [mailto:arafa...@gmail.com]
Sent: Monday, July 28, 2014 9:38 AM
To: solr-user
Subject: Re: copy EnumField to text field

On Mon, Jul 28, 2014 at 1:31 PM, Elran Dvir  wrote:
> But when no value is sent with severity, and the default of 0 is used, 
> the fix doesn't seem to work.

I guess the default in this case is figured out at the query time because there 
is no empty value as such. So that would be too late for copyField. If I am 
right, then you could probably use UpdateRequestProcessor and set the default 
value explicitly (DefaultValueUpdateProcessorFactory).

Regards,
   Alex.

Personal: http://www.outerthoughts.com/ and @arafalov Solr resources and
newsletter: http://www.solr-start.com/ and @solrstart Solr popularizers
community: https://www.linkedin.com/groups?gid=6713853

Email secured by Check Point 


Email secured by Check Point


Re: Perm Gen issues in SolrCloud

2014-07-28 Thread Poornima Jay
Hi Nitin,

Not sure of you have tried these steps.

1. Stop the Tomcat Server.
2.Find catalina.bat
3.Assign following line to JAVA_OPTS variable and add it into catalina.bat 
file. 
set JAVA_OPTS=-server -Xms512M -Xmx768M -XX:MaxPermSize=256m
 4. restart



On Saturday, 1 March 2014 6:02 AM, KNitin  wrote:
 


Hi Furkan

I have read that before but I haven't added any new classes or changed
anything with my setup. I just created more collections in solr. How will
that increase perm gen space ? Doesn't solr intern strings at all ?
Interned strings also go to the perm gen space right?

- Nitin



On Fri, Feb 28, 2014 at 3:11 PM, Furkan KAMACI wrote:

> Hi;
>
> Jack has an answer for a PermGen usages:
>
> "PermGen memory has to do with number of classes loaded, rather than
> documents.
>
> Here are a couple of pages that help explain Java PermGen issues. The
> bottom
> line is that you can increase the PermGen space, or enable unloading of
> classes, or at least trace class loading to see why the problem occurs.
>
>
> http://stackoverflow.com/questions/88235/how-to-deal-with-java-lang-outofmemoryerror-
> permgen-space-error
>
> http://www.brokenbuild.com/blog/2006/08/04/java-jvm-gc-permgen
> -and-memory-options/
> "
>
> You can see the conversation from here:
> http://search-lucene.com/m/iMaR11lgj3Q1/permgen&subj=PermGen+OOM+Error
>
> Thanks;
> Furkan KAMACI
>
>
> 2014-02-28 21:37 GMT+02:00 KNitin :
>
> > Hi
> >
> >  I am seeing the Perm Gen usage increase as i keep adding more
> collections.
> > What kind of strings get interned in solr? (Only schema , fields,
> > collection metadata or the data itself?)
> >
> > Will Permgen space (atleast interned strings) increase proportional to
> the
> > size of the data in the collections or with the # of collections
> > themselves?
> >
> >
> > I have temporarily increased the size of PermGen to deal with this but
> > would love to understand what goes on behind the scenes
> >
> > Thanks
> > Nitin
> >
>

Re: copy EnumField to text field

2014-07-28 Thread Jack Krupansky
There is a distinction between the original source input value for the 
indexing process and what value is actually indexed. Query searching will 
see whatever is actually indexed, not the original source input value. An 
URP could explicitly set the source input value to the default value if it 
is missing, but you would have to specify an explicit value for the URP to 
use.


-- Jack Krupansky

-Original Message- 
From: Elran Dvir

Sent: Monday, July 28, 2014 4:12 AM
To: solr-user@lucene.apache.org
Subject: RE: copy EnumField to text field

Are you saying that default values are for query and not for indexing?

Thanks.

-Original Message-
From: Alexandre Rafalovitch [mailto:arafa...@gmail.com]
Sent: Monday, July 28, 2014 9:38 AM
To: solr-user
Subject: Re: copy EnumField to text field

On Mon, Jul 28, 2014 at 1:31 PM, Elran Dvir  wrote:
But when no value is sent with severity, and the default of 0 is used, the 
fix doesn't seem to work.


I guess the default in this case is figured out at the query time because 
there is no empty value as such. So that would be too late for copyField. If 
I am right, then you could probably use UpdateRequestProcessor and set the 
default value explicitly (DefaultValueUpdateProcessorFactory).


Regards,
  Alex.

Personal: http://www.outerthoughts.com/ and @arafalov Solr resources and 
newsletter: http://www.solr-start.com/ and @solrstart Solr popularizers 
community: https://www.linkedin.com/groups?gid=6713853


Email secured by Check Point 



Bloom filter

2014-07-28 Thread Per Steffensen

Hi

Where can I find documentation on how to use Bloom filters in Solr 
(4.4). http://wiki.apache.org/solr/BloomIndexComponent seems to be 
outdated - there is no BloomIndexComponent included in 4.4 code.


Regards, Per Steffensen


solr uima and opencalais

2014-07-28 Thread tomcool
Hi.

I'm looking into different possibilities of named-entity extraction features
offered by Solr uima.
The OpenCalais web service would fit my needs, but I can't get it to work
right.

First question : is the openCalais annotator up to date ?

Right now, I can send a request to the openCalais service successfully, but
I when I try to write the response to a Solr field I am not able to retrieve
useful concepts.

In my uimaProcessor I have the following mappings :



org.apache.uima.calais.Country

coveredText
concept



If, instead of the feature "coveredText" I use "calaisType" (from the
openCalaisAnnotator), the field is filled with the following uri :
"http://s.opencalais.com/1/type/sys/InstanceInfo";

I tried with several other feature names but I don't retrieve anything
useful.

Is there a particular feature I should be using ? 






--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-uima-and-opencalais-tp4149454.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Passivate core in Solr Cloud

2014-07-28 Thread aurelien . mazoyer

Thank you Erick,

Ok, I will probably perform some tests. It seems to be a good candidate 
for a future blog post...


Regards,

Aurelien

On 27.07.2014 20:20, Erick Erickson wrote:

"Does not play nice" really means it was designed to run in a
non-distributed mode. There has
been no work done to verify that it does work in cloud mode, I fully 
expect

some "interesting"
problems in that mode. If/when we get to it that is.

About replication: I haven't heard of any problems, but I also haven't
heard of it
working in that environment. I expect that it'll only try to replicate 
when

it's
loaded, so that might be interesting

Best,
Erick


On Thu, Jul 24, 2014 at 6:49 AM, Aurélien MAZOYER <
aurelien.mazo...@francelabs.com> wrote:


Thank you Erick and Alex for your answers. Lots of core stuff seems to
meet my requirement but it is a problem if it does not work with Solr
Cloud. Is there an issue opened for this problem?
If I understand well, the only solution for me is to use multiple
monoinstances of Solr using transient cores and to distribute manually 
the
cores for my tenant (I assume the LRU mechanimn will be less effective 
as

it will be done per solr instance).
When you say "does NOT play nice with distributed mode", does it also
include the standard replication mecanism?

Thanks,

Regards,

Aurelien



Le 23/07/2014 17:21, Erick Erickson a écrit :

 Do note that the lots of cores stuff does NOT play nice with in

distributed mode (yet).

Best,
Erick


On Wed, Jul 23, 2014 at 6:00 AM, Alexandre 
Rafalovitch
>
wrote:

 Solr has some support for large number of cores, including transient

cores:http://wiki.apache.org/solr/LotsOfCores

Regards,
Alex.
Personal:http://www.outerthoughts.com/  and @arafalov
Solr resources:http://www.solr-start.com/  and @solrstart
Solr popularizers 
community:https://www.linkedin.com/groups?gid=6713853



On Wed, Jul 23, 2014 at 7:55 PM, Aurélien MAZOYER
  wrote:


Hello,

We want to setup a Solr Cloud cluster in order to handle a high 
volume

of
documents with a multi-tenant architecture. The problem is that an
application-level isolation for a tenant (using a mutual index with 
a



field

"customer") is not enough to fit our requirements. As a result, we 
need

1
collection/customer. There is more than a thousand customers and it
seems
unreasonable to create thousands of collections in Solr Cloud... 
But as



we

know that there are less than 1 query/customer/day, we are 
currently



looking

for a way to passivate collection when they are not in use. Can it 
be a



good


idea? If yes, are there best practices to implement this? What side


effects

can we expect? Do we need to put some application-level logic on 
top on



the

Solr Cloud cluster to choose which collection we have to unload 
(and



maybe

there is something smarter (and quicker?) than simply 
loading/unloading



the


core when it is not in used?) ?


Thank you for your answer(s),

Aurelien






Re: Auto Suggest

2014-07-28 Thread benjelloun
Hello Erick,

So in your opinion what is the solution to use autosuggest with sentece :)
an exemple will be very helpfull,

Thanks,
best regards,
Anass BENJELLOUN



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Auto-Suggest-tp4149004p4149441.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: copy EnumField to text field

2014-07-28 Thread Elran Dvir
Are you saying that default values are for query and not for indexing?

Thanks.

-Original Message-
From: Alexandre Rafalovitch [mailto:arafa...@gmail.com] 
Sent: Monday, July 28, 2014 9:38 AM
To: solr-user
Subject: Re: copy EnumField to text field

On Mon, Jul 28, 2014 at 1:31 PM, Elran Dvir  wrote:
> But when no value is sent with severity, and the default of 0 is used, the 
> fix doesn't seem to work.

I guess the default in this case is figured out at the query time because there 
is no empty value as such. So that would be too late for copyField. If I am 
right, then you could probably use UpdateRequestProcessor and set the default 
value explicitly (DefaultValueUpdateProcessorFactory).

Regards,
   Alex.

Personal: http://www.outerthoughts.com/ and @arafalov Solr resources and 
newsletter: http://www.solr-start.com/ and @solrstart Solr popularizers 
community: https://www.linkedin.com/groups?gid=6713853

Email secured by Check Point