Yes, if you use "+" in your expression, then the numeric sum will be computed
if the fields areintegers and the catenation if they are strings; the former
will not yield the desired uniqueness.But now that I think about it some more,
even the latter will not, here's why: If the fields in
The mapping: tuple -> dedup-key needs to be 1-1; if multiple tuples are mapped
to the same dedup key you'll see problems like this. In your case multiple
tuples can be mapped to the same value of "id + id1". For example, all tuples
with (id, id1) being any of these pairs will all map to a
It needs to be an expression that combines both (or all) values: try "id + id1"
Ram
On Monday, October 23, 2017, 6:04:14 PM PDT, Vivek Bhide
wrote:
Thanks Ram for your suggestions
Field types that I am trying are the basic primitive types. In fact, I was
just
Could you provide some details on what types your multiple keys are and what
expression variants you tried and what the result was ?
The base class of BoundedDedupOperator is AbstractDeduper; within that class
you'll see an method getKey(); you should be able to override that to retrieve
the
If log file aggregation is enabled, aggregated logs for terminated logs
should be available on HDFS and retrievable with:
*yarn logs -applicationId ***
Log aggregation is discussed further at:
https://hortonworks.com/blog/simplifying-user-logs-management-and-access-in-yarn/
and is controlled
https://ci.apache.org/projects/apex-core/apex-core-javadoc-release-3.4/index.html
https://ci.apache.org/projects/apex-core/apex-core-javadoc-release-3.5/index.html
... at the above links.
Ram
--
___
Munagala V. Ramanath
Software Engineer
Take a look at the testCompression() method
in AbstractFileOutputOperatorTest.java for an
example.
Ram
On Mon, Mar 13, 2017 at 5:25 PM, Ganelin, Ilya
wrote:
> What is the recommended way to write compressed data to HDFS? Should I
> extend AbstractFileOutputOperator
g from the code above which
> starts
> > > > with 'Marking operator '
> > > >
> > > > Regards,
> > > > Ashwin.
> > > >
> > > > On Tue, Feb 28, 2017 at 12:03 PM, Sunil Parmar <
> spar.
It likely means that that operator is taking too long to return from one of
the callbacks like beginWindow(), endWindow(),
emitTuples(), etc. Do you have any potentially blocking calls to external
systems in any of those callbacks ?
Ram
On Tue, Feb 28, 2017 at 11:09 AM, Sunil Parmar
gt; public transient DefaultInputPort kafkaStreamInput =
>>
>> new DefaultInputPort() {
>>
>> List errors = new ArrayList();
>>
>> @Override
>>
>> public void process(String consumerRecord) {
>>
>>
Looks like you may be blocking the operator thread with the p.waitFor() and
the rest of the code
to process the child process output.
Try using a separate thread to handle the child as described, for example,
here:
http://stackoverflow.com/questions/26319804/adapting-c-fork-code-to-a-java-program
ail.com>
wrote:
> hI Ram,
>
> we running Data Torrent applicatoin using apex 3.2.1 build.
> Santry i.e authentication required to access DT console.
>
>
>
> On Mon, Feb 6, 2017 at 10:16 PM, Munagala Ramanath <r...@datatorrent.com>
> wrote:
>
>> Could you p
Could you provide more details on what you're running, i.e. which version
of Apex, etc. ?
Also, what is santry ?
Do any of the other REST API calls work, like "/about" for instance ?
Ram
On Mon, Feb 6, 2017 at 8:35 AM, chiranjeevi vasupilli
wrote:
> bringing on top..
>
>
There are multiple approaches, each with its own tradeoffs. Here is first
step:
A1. Create a pair of in-memory non-transient queues to hold the tuples
(non-transient
because we want them to be checkpointed and restored on recovery
from failure).
A2. Create a separate thread that waits for
The operator has this comment at the top:
** This implementation is generic so it uses offset/limit mechanism for
batching which is not optimal. Batching is*
* * most efficient when the tables/views are indexed and the query uses
this information to retrieve data.*
* * This can be achieved in
In addition to the methods mentioned by others, there is also the REST API
documented
at: http://docs.datatorrent.com/dtgateway_api/
The calls:
*POST /ws/v2/appPackages?merge={replace|fail|ours|theirs}*
POST
Once log aggregation is enabled as Priyanka described, you can retrieve
logs for a specific
application with a command like this:
*yarn logs -applicationId application_1473359658420_0588*
Ram
On Mon, Dec 12, 2016 at 2:29 AM, Priyanka Gugale wrote:
> You need to set hadoop
W.r.t. item 3, it may be an issue of whether the group is active or not and
whether it has been flushed from
the memory cache. Some discussion here:
http://grokbase.com/t/kafka/users/15bsxxgp49/new-kafka-consumer-groups-sh-not-showing-inactive-consumer-gorup
Ram
On Thu, Dec 8, 2016 at 8:36 AM,
Just to be sure, did that change resolve the issue ?
Ram
On Fri, Dec 2, 2016 at 11:54 AM, Max Bridgewater
wrote:
> OK. Thanks guys. It continued to fail with min 256 and max 512 or 1024. In
> the end I switched off the check of ratio virtual/physical memory. My
>
A brief description is here:
http://docs.datatorrent.com/beginner/#buffer-server
It is indeed part of apex-core; see:
https://github.com/apache/apex-core/tree/master/engine/src/main/java/com/datatorrent/stram/
as well as the *stream* subdirectory.
Ram
On Fri, Nov 25, 2016 at 11:13 AM, Max
One way is to use an external process that invokes appropriate REST API
calls and checks
results. An sample Python script to do this for the application as a whole
is at:
https://github.com/DataTorrent/examples/blob/master/tools/monitor.py
The REST API is documented at:
The .apa file content can be listed with:* jar tvf ./target/*.apa*
Could you post the output of "*mvn dependency:tree*" ?
Ram
On Fri, Nov 25, 2016 at 4:15 AM, prakashdutt...@hotmail.com <
prakashdutt...@hotmail.com> wrote:
> The apa lists message is showing as below
>
> apex> launch
Since the diagnostic is about exceeding virtual memory limits, please see:
http://docs.datatorrent.com/configuration/#yarn-vmem-pmem-ratio-tuning
for an alternative solution.
Ram
On Mon, Nov 21, 2016 at 4:20 AM, Max Bridgewater
wrote:
> The issue turned out to be
For Req1, a better approach might be to have a single "error reporting"
operator that connects to the DB;
all the other operators can send custom error tuples to it with appropriate
details of the type of error.
That way, if there are issues that are DB-related, you have 1 place to
debug them
Couple of questions:
Could you outline the steps you followed to run this application ?
Do you have any custom code that is creating or modifying a Configuration
object ?
Can you try building and running some of the example programs at:
>I have set the YARN minimum allocation property *in property.xml of the
>project*, which doesn't solve the problem.
Setting it in your project will not help; you need to set it in
yarn-site.xml
and restart YARN. If you're running on the sandbox, the location of that
file is in my earlier
To add to Ashwin's diagnosis, assuming you have *properties.xml* setup as
described in that tutorial,
you need to add to the YARN configuration file *yarn-site.xml* at:
*/sfw/hadoop/current/etc/hadoop*
the following stanza:
*yarn.scheduler.minimum-allocation-mb128*
and restart the Hadoop
If you want round-robin distribution which will give you uniform load
across all partitions you can use
a StreamCodec like this (provided the number of partitions is known and
static):
*public class CatagoryStreamCodec extends
KryoSerializableStreamCodec {*
* private int n = 0;*
* @Override*
*
604acf0e41
> Yes I have used the apex command to lauch the apa file.
>
>
> Thanks
> Sanal
>
> On Tue, Sep 13, 2016 at 1:29 AM, Munagala Ramanath <r...@datatorrent.com>
> wrote:
>
>> Looks like this happens if jersey-server is not present
>> (e.g. http://stacko
Yes, if you have ports at least one must be connected if there are no
annotations on them.
The code is in LogicalPlan.validate() -- checkout the allPortsOptional
variable.
Ram
On Tue, Sep 13, 2016 at 3:17 AM, Tushar Gosavi
wrote:
> Hi All,
>
> I have an input operator
Looks like this happens if jersey-server is not present
(e.g.
http://stackoverflow.com/questions/8662919/jersey-no-webapplication-provider-is-present-when-jersey-json-dependency-added
)
Have you made any changes to the pom.xml ? If so, can you post it here ?
Also, can you tell us a bit more
It looks like there is not enough memory for all the containers. If you're
running in the sandbox, how much memory is allocated to the sandbox ?
Can you try increasing it ?
There is also a detailed discussion of how to manage memory using
configuration
parameters in the properties files at:
If you are dealing with file data purely as byte arrays and copying them
from one place to another, you need not worry about the language or charset
since the bytes are preserved.
If you are converting them to Strings explicitly or using classes that might
do so implicitly, you need to specify an
AM, McCullough, Alex <
alex.mccullo...@capitalone.com> wrote:
> Thanks Ram.
>
>
>
> If I didn’t want dynamic partitioning and just round robin on a fixed # of
> partitions, can it just be set through a property? If so, what is the
> property?
>
>
>
> *From: *Munagal
For cases where use of a different thread is needed, it can write tuples to
a queue from where the operator thread pulls them -- JdbcPollInputOperator
in Malhar has an example.
Ram
On Wed, Aug 10, 2016 at 1:50 PM, hsy...@gmail.com wrote:
> Hey Vlad,
>
> Thanks for bringing
"relinquish" => "acquire" :-)
Ram
On Fri, Aug 5, 2016 at 6:01 AM, Chinmay Kolhatkar
wrote:
> Hi Alex,
>
> setup and activate both are guaranteed to be called before starting of
> dataflow. Hence in your case either should be fine.
>
> There might be few hundred ms of
Which version of Hadoop ? If it is 2.6.X, upgrading to 2.7.X might fix the
problem, e.g.
https://marc.info/?l=hadoop-user=142501302419779=2
Ram
On Tue, Jul 26, 2016 at 1:59 PM, Raja.Aravapalli wrote:
>
> Hi,
>
> My DAG fails every week atleast once with the below
One way is to have a pass-through operator X that is parallel partitioned
like your B currently.
Then, connect the output port of X to B and use a suitable partitioner for
B to create as many
partitions as you want: A -> X -> B -> C.
Ram
On Mon, Jul 25, 2016 at 9:41 AM, Yogi Devendra
hooting page sounds the appropriate location. I shall raise PR with
> the given suggestions.
>
> --prad
>
> On Tue, Jul 19, 2016 at 5:49 AM, Munagala Ramanath <r...@datatorrent.com>
> wrote:
>
> > There is already a link to a troubleshooting page at bottom of
> > https:
There is already a link to a troubleshooting page at bottom of
https://apex.apache.org/docs.html
That page already has some discussion under the section entitled
"Calculating Container Memory"
so adding new content there seems like the right thing to do.
Ram
On Mon, Jul 18, 2016 at 11:27 PM,
Please see: http://docs.datatorrent.com/troubleshooting/#configuring-memory
Ram
On Tue, Jul 12, 2016 at 6:57 AM, Raja.Aravapalli wrote:
>
> Hi,
>
> My DAG is failing with memory issues for container. Seeing below
> information in the log.
>
>
>
> Diagnostics:
Please take a look at the Python script under
https://github.com/DataTorrent/examples/tree/master/tools
It uses the Gateway REST API to retrieve application info given the name;
the id is part of that JSON object.
Ram
On Tue, Jul 12, 2016 at 6:58 AM, Mukkamula, Suryavamshivardhan (CWM-NR) <
To add some additional explanation to what Sandesh said, it looks like your
operator is
dying or getting killed after some time, so you should look at the
Application Master logs to find
out why this is happening.
When it goes down, a new operator is created and state from an earlier
checkpoint
Nothing available out-of-the-box but there are some pieces that may be
useful:
https://github.com/apache/apex-malhar/blob/master/library/src/main/java/com/datatorrent/lib/io/AbstractHttpInputOperator.java
For the Zip part, there is an example using GZip for output here:
193c1c by jenkins source checksum
> 48db4b572827c2e9c2da66982d14
>
> 7626",
>
> "resourceManagerVersion": "2.7.1.2.3.2.0-2950",
>
>"resourceManagerVersionBuiltOn": "2015-09-30T18:20Z",
>
> "rmStateStoreName&quo
You don't need dynamic partitioning to achieve that topology. You can
simply create your DAG as: A --> X --> Y and then set the *PARTITIONER*
attribute on X
as discussed in the "Advanced Features" section of the TopN words tutorial
at:
http://docs.datatorrent.com/tutorials/topnwords-c7/
The
e
> Create a stream
>
>
> Thanks,
> Junguk
>
> 2016-06-07 12:37 GMT-04:00 Munagala Ramanath <r...@datatorrent.com>:
>
>> Hi,
>>
>> For Dynamic Partitioning, please take a look at the example at:
>>
>> https://github.com/DataTorrent/examples/tree/
Hi,
For Dynamic Partitioning, please take a look at the example at:
https://github.com/DataTorrent/examples/tree/master/tutorials/dynamic-partition
as well as the Malhar class:
ders ?
>
>
>
> Regards,
>
> Surya Vamshi
>
>
>
> *From:* Munagala Ramanath [mailto:r...@datatorrent.com]
> *Sent:* 2016, May, 20 2:35 PM
> *To:* us...@apex.incubator.apache.org
> *Subject:* Re: Information Needed
>
>
>
> It appears that
49 matches
Mail list logo