The impression that I got from reading the mailing lists is that the
developers are slowly moving to deprecate all the parser plugins in favor of
Tika - but that this process is not quite finished in the 1.1 release, and
that the Tika plugin is still a little wonky. Is this correct?
-MB
--
View
I think I resolved the issue
The way to setup the subcollections.xml is NOT this
stylebook
sb
http://mysite.mydomain.com/guidance/
http://mysite/guidance/
It needs to be setup the following way.
stylebook
sb
http://cnnlibrary.turner.com/guidance/
http://cnnlibrary/guidance/
Each pa
As Raj said, I forget to build the source, you can build the source with
ant.
--
Yavuz Selim YILMAZ
2010/9/8 Richard Huang
> Can u share how you resolve it? Thanks.
>
> Sent from my iPhone
>
> On Sep 8, 2010, at 1:33 AM, Yavuz Selim YILMAZ
> wrote:
>
> > Ok Raj, I solved the problem, thnx.
>
I'll try to give it a shot this week for the 1.2 branch and trunk if it isn't
too different. It shouldn't be too hard and Julien's explanation on how to
read the configuration makes a lot of sense.
On Wednesday 08 September 2010 16:37:29 Mattmann, Chris A (388J) wrote:
> Hi Markus,
>
> > Inter
Hi Markus,
> Interesting! But can the mime extractor return more than one type for a given
> file in Nutch?
Sure, Nutch metadata is a named Field->multi-value structure so a file (or
piece of content) can certainly have more than 1 type.
> I see, but in that case it would be helpful if the canon
Hello Chris,
On Wednesday 08 September 2010 16:17:30 Mattmann, Chris A (388J) wrote:
> Hi Markus,
>
> In fact, there are plenty of times that files have > 1 mime type. There is
> an entire classification scheme from IANA that defines parent-child
> relationships between mime type (such as the n
Hi Markus,
In fact, there are plenty of times that files have > 1 mime type. There is an
entire classification scheme from IANA that defines parent-child relationships
between mime type (such as the notion that text/xml is a descendant of
text/plain).
The current index-more plugin splits up mi
the hadoop user list would be a better place to ask this
2010/9/8 yi zhu
>
> I've run a 2-datanode-cluster to do crawling job, now I need to add one new
> node to the cluster without stop the cluster
>
> I add a new line in conf/slaves ,and what should I do next? stop-all.sh and
> start-all.sh s
Thank you - We are able to see the meta data on the Nutch front using bin/nutch
org.apache.nutch.parse.ParserChecker *, but cannot see the metadata on the Solr
side. We have added metadata fields in solrmapping and also checked our
schema.xml on both nutch and solr. Are there any additional conf
One plugin can add multiple and different fields.
In the schema.xml you can map your new fields coming from Nutch. But I
don't really know about solrmapping.xml.
On 10/09/08 07:35, Yavuz Selim YILMAZ wrote:
More than one field, then define a new plugin per new metadata?
Differenet pages ha
The message you sent is not an error, it is a warning. It should still
compile. Please follow the steps at http://github.com/enis/gora
Cheers,
Enis
On Thu, Sep 2, 2010 at 10:16 PM, Nemani, Raj wrote:
> All,
>
>
>
> I am trying to compile Gora to compile latest lNutch turnk. I am doing
> the fo
Hi,
I think we need to commit all the necessary files to nutch so that it can
work out of the box for sql, hbase and casssandra. We can even write
commented-out entries in gora.properties, nutch-site.xml, etc so that using
nutch with different backends becomes a configuration change. I will open a
I've run a 2-datanode-cluster to do crawling job, now I need to add one new
node to the cluster without stop the cluster
I add a new line in conf/slaves ,and what should I do next? stop-all.sh and
start-all.sh should work, but they seem to stop all runing job in the cluster
Can u share how you resolve it? Thanks.
Sent from my iPhone
On Sep 8, 2010, at 1:33 AM, Yavuz Selim YILMAZ wrote:
> Ok Raj, I solved the problem, thnx.
> --
>
> Yavuz Selim YILMAZ
>
>
> 2010/9/7 Nemani, Raj
>
>> Oh wait, Your command looks wrong too (dunno if that was a typo)
>>
>> I
> Perhaps someone could give a pointer on how to read a configuration setting
> for a plug-in and where to store the setting (Nutch config or plugin.xml)
> and
> i might actually write my first Java code again since four years!
>
You'd typically do that by adding something like
*
conf.getBoolean("
Hi guys,
I've summarized the steps to follow for having GORA+Hbase with Nutch 2.0 on
http://wiki.apache.org/nutch/GORA_HBase
Feel free to amend and improve as you see fit.
Please bear in mind that Nutch 2.0 is at a very early stage and is far from
being bug-proof, see in particular [1].
HTH
Ju
Julien,
I've filed an issue [1], but i cannot, at this moment, provide a patch that
enables configuration of this feature. I did disable it in my check out
though.
Perhaps someone could give a pointer on how to read a configuration setting
for a plug-in and where to store the setting (Nutch con
This description fooled me too once but it hasn't been patched yet? Now it is
[1], please commit.
[1]: https://issues.apache.org/jira/browse/NUTCH-900
On Wednesday 14 July 2010 07:10:47 Mattmann, Chris A (388J) wrote:
> No problem, Brad! If you'd like feel free to create an issue in Nutch JIRA
>
Hi Markus,
Your analysis is correct, see the comments in the MoreIndexingFilter
*
* Add Content-Type and its primaryType and subType add contentType,
* primaryType and subType to field "type" as un-stored, indexed and
* un-tokenized, so that search results can be confined by contentT
I've checked the MoreIndexingFilter sources and my suspicions were right, it
really splits the input in the getParts method. I'd love to have this removed
and committed, but i guess more work is needed to keep it compatible such as
tokenizing it to keep it searchable, which would require an sche
Hi,
I'm testing the index-more plug-in but, to my surprise, it is defined as a
multi valued field in the shipped Solr schema configuration! Since when do
files have more than one mime type?
Well, they don't! It seems the plug-in splits mime types by slash and exports
three terms per document,
Also, for html, should metadata be at the "head", can it be in "body" ?
--
Yavuz Selim YILMAZ
2010/9/8 Yavuz Selim YILMAZ
> More than one field, then define a new plugin per new metadata?
>
> Differenet pages have different extra metadatas, then would it be
> configured in schema.xml and solrm
22 matches
Mail list logo