Re: How to distcp data between two clusters which are not in the same local network?

2016-08-15 Thread Wei-Chiu Chuang
Hello, if I understand your question correctly, you are actually building a multi-home Hadoop, correct? Multi-homed Hadoop cluster can be tricky to set up, to the extend that Cloudera does not recommend it. I've not set up a multihome Hadoop cluster before, but I think you have to make sure the

Re: Teradata into hadoop Migration

2016-08-05 Thread Wei-Chiu Chuang
oudera Wei-Chiu Chuang A very happy Clouderan > On Aug 4, 2016, at 10:50 PM, Rakesh Radhakrishnan <rake...@apache.org> wrote: > > Sorry, I don't have much insight about this apart from basic Sqoop. AFAIK, it > is more of vendor specific, you may need to dig more into that lin

Re: How to speed up the building process of Hadoop?

2016-09-30 Thread Wei-Chiu Chuang
One suggest: add -Dmaven.javadoc.skip=true This parameter skips building javadocs. For me this reduces overall build time to about 2 minutes. > On Sep 30, 2016, at 5:40 AM, Mohammed Q. Hussian wrote: > > Hi All. > > I'm building Hadoop from source using the following

Re: native snappy library not available: this version of libhadoop was built without snappy support.

2016-10-04 Thread Wei-Chiu Chuang
Hi Uthayan, what’s the version of Hadoop you have? Hadoop 2.7.3 binary does not ship with snappy precompiled. If this is the version you have you may have to rebuild Hadoop yourself to include it. Wei-Chiu Chuang > On Oct 4, 2016, at 12:59 PM, Uthayan Suthakar <uthayan.sutha...@gma

Re: native snappy library not available: this version of libhadoop was built without snappy support.

2016-10-04 Thread Wei-Chiu Chuang
It seems to me this issue is the direct result of MAPREDUCE-6577 <https://issues.apache.org/jira/browse/MAPREDUCE-6577> Since you’re on a CDH cluster, I would suggest you to move up to CDH5.7.2 or above where this bug is fixed. Best, Wei-Chiu Chuang > On Oct 4, 2016, at 1:26 PM,

Re: [How to :]Apache Impala and S3n File System

2016-09-19 Thread Wei-Chiu Chuang
I think the Impala user mailing list may have better answer there. Forwarding your question to there. -BEGIN PGP MESSAGE- Comment: GPGTools - https://gpgtools.org hQIMAyfFM+08xiG6AQ/9GB/XiNSJSyxv/tWQhcHqUxLa7+gzg3kTvzAWhtkV9VAD

Re: hdfs2.7.3 kerberos can not startup

2016-09-20 Thread Wei-Chiu Chuang
You need to run kinit command to authenticate before running hdfs dfs -ls command. Wei-Chiu Chuang > On Sep 20, 2016, at 6:59 PM, kevin <kiss.kevin...@gmail.com> wrote: > > Thank you Brahma Reddy Battula. > It's because of my problerm of the hdfs-site config file and https c

Re: Hadoop KMS, security module

2016-09-26 Thread Wei-Chiu Chuang
Hi, I'm not an expert in Hadoop KMS. But as far as I know Hadoop KMS itself does not rely on particular hardware for the purpose. The Hadoop KMS implementation is based on Java Provider API https://docs.oracle.com/javase/7/docs/api/java/security/Provider.html It looks like though there is ongoing

Re: Secure Hadoop - invalid Kerberos principal errors

2016-10-20 Thread Wei-Chiu Chuang
oop-common/SecureMode.html#Kerberos_principals_for_Hadoop_Daemons> > dfs.journalnode.kerberos.principal > hdfs/aw1hdnn001.tnbsound@tnbsound.com Wei-Chiu Chuang A very happy Clouderan > On Oct 20, 2016, at 10:19 AM, Mark Selby <mse...@pandora.com> wrote: > > We

Re: Authentication Failure talking to Ranger KMS

2016-10-11 Thread Wei-Chiu Chuang
Somes to me you encountered this bug? HDFS-10481 <https://issues.apache.org/jira/browse/HDFS-10481> If you’re using CDH, this is fixed in CDH5.5.5, CDH5.7.2 and CDH5.8.2 Wei-Chiu Chuang A very happy Clouderan > On Oct 11, 2016, at 8:38 AM, Benjamin Ross <br...@lattice-engine

Re: Where does Hadoop get username and group mapping from for linux shell username and group mapping?

2016-10-14 Thread Wei-Chiu Chuang
t most of it applies to 2.7/2.8 Wei-Chiu Chuang A very happy Clouderan > On Oct 14, 2016, at 11:33 AM, Ravi Prakash <ravihad...@gmail.com> wrote: > > Chen! > > It gets it from whatever is configured on the Namenode. > https://hadoop.apache.org/docs/r2.7.2/h

Re: Encrypt a directory using some key (JAVA)

2016-12-14 Thread Wei-Chiu Chuang
Hi If you have access to Hadoop codebase, take a look at CryptoAdmin class, which implements these two commands. Internally, the commands are implemented via DistributedFileSystem#createEncryptionZone and DistributedFileSystem#listEncryptionZones Regards, Wei-Chiu Chuang A very happy

Re: SocketTimeoutException in DataXceiver

2016-12-20 Thread Wei-Chiu Chuang
This looks like a general issue, and there are multiple possible explanations. It could be either a flaky NIC, or flaky network switches. On the other hand, if the DataNode is busy and all dataXceiver threads are used (by default: 4096 threads), this error may also be seen at the client side.

Re: Sqoop and kerberos ldap hadoop authentication

2017-09-07 Thread Wei-Chiu Chuang
Hi, The message "User xxx not found" feels more like group mapping error. Do you have the relevant logs? Integrating AD with Hadoop can be non-trivial, and Cloudera's general recommendation is to use third party authentication integrator like SSSD or Centrify, instead of using LdapGroupsMapping.

Re: Hive - Json Serde - ORC

2017-12-06 Thread Wei-Chiu Chuang
Hi I think you are better off asking this question at the hive mailing list. Best On Wed, Dec 6, 2017 at 6:43 AM, kaducangica . wrote: > Hi all, > > i have a very complex json that i need to insert in a hive table. A json > example follws attached. > > First of all i

Re: Automatic Failover to different Data Center.

2018-05-07 Thread Wei-Chiu Chuang
Distcp is a backup tool, not a synchronization tool. At best, you get a point-in-time snapshot of the DC1. For example, a period schedule of distcp every night at 12am. But in case of total failure, you lose everything from that point in time. On Mon, May 7, 2018 at 12:30 AM, akshay naidu

Re: Security problem extra

2018-06-27 Thread Wei-Chiu Chuang
Hi Zongtian, This is definitely not a JDK issue. This is the wire-protocol compatibility between client and server (DataNode). bq. what the client mean, it mean the application running on hdfs, how does it have a encryption? I'm not quite sure what you asked. HDFS supports at-rest encryption,

Re: Aws EMR Hadoop Web Access

2018-01-09 Thread Wei-Chiu Chuang
There's a project called Apache Knox that seems to offers what you need. https://hortonworks.com/apache/knox-gateway/ On Tue, Jan 9, 2018 at 2:20 PM, Jhon Anderson Cardenas Diaz < jhonderson2...@gmail.com> wrote: > According to aws documentation for EMR web access: > > > > *Setup Web

Re: UserGroupInformation and Kerberos

2018-01-02 Thread Wei-Chiu Chuang
Hi Jorge, If you use Hadoop library as a client, and your first login using key is via UserGroupInformation#loginUserFromKeytab(), the client automatically relogins again using keytab when it gets an exception (see o.a.h.ipc.Client#handleSaslConnectionFailure). Note: using

Re: HDFS User impersonation on encrypted zone | Ranger KMS

2018-08-02 Thread Wei-Chiu Chuang
Hi, this is a supported use case. Please make sure you configure the KMS proxy user correctly as well (it is separately from HDFS proxy user settings) https://hadoop.apache.org/docs/current/hadoop-kms/index.html#KMS_Proxyuser_Configuration On Thu, Aug 2, 2018 at 12:30 PM Ashish Tadose wrote: >

Re: Hadoop impersonation not handling permissions

2018-07-30 Thread Wei-Chiu Chuang
Pretty sure this is the expected behavior. >From the stacktrace, you're impersonation is configured correctly (i.e. it successfully perform operation on behalf of user b) the problem is your file doesn't allow b to access it. On Mon, Jul 30, 2018 at 1:25 PM Harinder Singh <

Re: spark structured streaming jobs working in HDP2.6 fail in HDP3.0

2018-08-30 Thread Wei-Chiu Chuang
Hi Lian, I don't know much about Spark structured streaming, but judging from the stacktrace, you're application was trying to access HftpFileSystem, which is removed in Apache Hadoop 3. Most likely it is removed in HDP3.0 too (Hortonworks folks can confirm) This is documented in CDH6.0 release

Re:

2018-12-20 Thread Wei-Chiu Chuang
+Hdfs-dev Hi Shuubham, Just like to clarify a bit. What's the purpose of this work? Is this for the general block placement policy in HDFS, or the balancer/mover/diskbalancer, or decommissioning/recommissioning? Block placement is determined by NameNode. Do you intend to shorten the time to

Re: Question about KMS

2018-12-10 Thread Wei-Chiu Chuang
Hi Xiaodong, Generally speaking, admin operations are not in DistributedFileSystem class. Some of the admin APIs may be found in HdfsAdmin (erasure coding, storage policy APIs) In this case, KeyProvider#createKey() does exactly what you want. On Mon, Dec 10, 2018 at 6:58 PM wrote: > hello,

Re: HDFS DirectoryScanner is bothering me

2018-12-04 Thread Wei-Chiu Chuang
Do you have a heapdump? Without a heapdump it's not easy to definitely point to DirectoryScanner for GC issues. That said, I did notice DirectoryScanner holding global lock for quite a few seconds periodically, but that's unrelated. to GC. On Thu, Nov 29, 2018 at 12:56 AM Yen-Onn Hiu wrote: >

Re: Files vs blocks

2019-01-29 Thread Wei-Chiu Chuang
I don't feel this is strictly a small file issue (since I am not seeing the average file size) But it looks like your directory/file ratio is way too low. I've seen that when Hive creates too many partitions. That can render Hive queries inefficient. On Tue, Jan 29, 2019 at 2:09 PM Sudhir Babu

Re: Right to be forgotten and HDFS

2019-04-15 Thread Wei-Chiu Chuang
Wow, Chao, didn't realize you guys are making Hudi into Apache :) HDFS is generally not a good fit for this use case. I've seen people using Kudu for GDPR compliance. On Mon, Apr 15, 2019 at 11:11 AM Chao Sun wrote: > Checkout Hudi (https://github.com/apache/incubator-hudi) which adds > upsert

Re: Python Hadoop Example

2019-06-16 Thread Wei-Chiu Chuang
Thanks Artem, Looks interesting. I honestly didn't know what Hadoop Streaming API is used for. Here are more references: https://hadoop.apache.org/docs/r3.2.0/hadoop-streaming/HadoopStreaming.html I think it brings to another question: how do we treat Python as a first class citizen. Especially

Re: [DISCUSS] HDFS roadmap/wish list

2019-06-10 Thread Wei-Chiu Chuang
, for that matter) On Mon, Jun 10, 2019 at 12:11 PM Sudeep Singh Thakur < sudeepthaku...@gmail.com> wrote: > Hi , > > Examples are most helpful for developer. Please add examples as much as we > can. > > Thanks > Sudeep Thakur > > On Mon, Jun 10, 2019, 10:38 PM Wei-Chiu

Re: NVMe Over fabric performance on HDFS

2019-06-25 Thread Wei-Chiu Chuang
There are a few Intel folks contributor NVMe related features in HDFS. They are probably the best source for this questions. Without having access to the NVMe hardware, it is hard to tell. I learned GCE offers Intel Optane DC Persistent Memory attached instances. That can be used for tests if any

HDFS Scalability Limit?

2019-06-13 Thread Wei-Chiu Chuang
Hi community, I am currently drafting a HDFS scalability guideline doc, and I'd like to understand any data points regarding HDFS scalability limit. I'd like to share it publicly eventually. As an example, through my workplace, and through community chatters, I am aware that HDFS is capable of

Re: [DISCUSS] HDFS roadmap/wish list

2019-06-13 Thread Wei-Chiu Chuang
m > dfs.datanode.balance.bandwidthPerSecbut only for re-replication. > I am pretty sure I've got people asking about this before a few times. > > Thanks and regards > JL > > Le lun. 10 juin 2019 à 19:08, Wei-Chiu Chuang > a écrit : > >> Hi! >> >> I am solic

Re: [DISCUSS] HDFS roadmap/wish list

2019-06-11 Thread Wei-Chiu Chuang
ve to worry about my worker nodes having volume-level or > whole-disk-level encryption. Even if I have Hadoop traffic only crossing a > LAN that's captive to the cluster, I might still have to worry about worker > nodes being stolen outright or having the drive(s) taken out of them. > >

[DISCUSS] HDFS roadmap/wish list

2019-06-10 Thread Wei-Chiu Chuang
Hi! I am soliciting feedbacks for HDFS roadmap items and wish list in the future Hadoop releases. A community meetup is happening soon, and perhaps

Re: HDFS Scalability Limit?

2019-06-15 Thread Wei-Chiu Chuang
at the per-datanode limit is. > > Thanks and 73 > Kihwal > > On Thu, Jun 13, 2019 at 1:57 PM Wei-Chiu Chuang > wrote: > >> Hi community, >> >> I am currently drafting a HDFS scalability guideline doc, and I'd like to >> understand any data

Re: Webhdfs and S3

2019-05-22 Thread Wei-Chiu Chuang
I've never tried, but it seems possible to start a Httpfs server with fs.defaultFS = s3a://your-bucket Httpfs server speaks WebHDFS protocol so your webhdfs client can use webhdfs. And then for each webhdfs request, httpfs server translates that into the corresponding FileSystem API call. If the

Re: Webhdfs and S3

2019-05-22 Thread Wei-Chiu Chuang
ive > hdfs as well as S3 in the same cluster. If we change fs.defaultFS then I > would not be able to access the HDFS storage. > > > > *From:* Wei-Chiu Chuang > *Sent:* Wednesday, May 22, 2019 9:36 AM > *To:* Joseph Henry > *Cc:* user@hadoop.apache.org > *Subject:* Re:

Re: Namenode crashes in 2.7.2

2019-07-11 Thread Wei-Chiu Chuang
Hi Kumar, It seems like the fix for this bug addresses the root cause of the problem, but doesn't seem to help when the NameNode already suffers this problem. I would suggest you to download Hadoop 2.7.2, add a try catch block to catch/swallow the NPE exception. Rebuild it and see if the NameNode

Re: Is hadoop maintained?

2019-07-08 Thread Wei-Chiu Chuang
Yuri, FreeBSD is not currently a supported operating system for Hadoop, and as far as I know it receives pretty limited attention in the community. Last time I checked, Hadoop source code does not compile on FreeBSD (Hadoop 2.x) out of box and FreeBSD's port has some source code change in order

Re: Hadoop HDFS Fault Injection

2019-08-14 Thread Wei-Chiu Chuang
Aleksander, Yes I am aware of that doc but I've never seen any one maintaining that piece of code in the last 4 years. And I don't think any one had ever used that. On Wed, Aug 14, 2019 at 5:12 AM Aleksander Buła < ab370...@students.mimuw.edu.pl> wrote: > Hi, > > I would like to ask whether the

Re: What do you think about HDFS using GFS2 (shared disk file system) or GPFS (parallel filesystem) rather than local file system?

2019-08-17 Thread Wei-Chiu Chuang
Not familiar with GPFS, but looking at IBM's website, GPFS has a client that emulates Hadoop RPC https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.1/com.ibm.spectrum.scale.v4r21.doc/bl1adv_Overview.htm So you can just use GPFS like HDFS. It may be the quickest way to approach this use

[DISCUSS] move storage community online sync schedule

2019-08-19 Thread Wei-Chiu Chuang
I received some feedbacks that the bi-weekly storage online sync that happens Wednesday 9AM US pacific time (GMT-8) is too early for west coast folks, and the fact is that the majority of Hadoop developers are in the US. Would it make sense to move it to a later time to allow more US west coast

Hadoop storage community online sync

2019-08-19 Thread Wei-Chiu Chuang
For this week, We will have Konstantin and the LinkedIn folks to discuss a recent project that's been baking for quite a while. This is an exciting project as it has the potential to improve NameNode's throughput by 40%. HDFS-14703 NameNode

Re: Hadoop storage community online sync

2019-08-20 Thread Wei-Chiu Chuang
zone (probably more specifically, California) So GMT-7 it is. On Mon, Aug 19, 2019 at 11:16 PM Akira Ajisaka wrote: > Thank you for the information. > > Now US pacific time is GMT-7, isn't it? > > -Akira > > On Tue, Aug 20, 2019 at 6:56 AM Wei-Chiu Chuang > wrote: >

Re: Hadoop storage community online sync

2019-08-21 Thread Wei-Chiu Chuang
to fix or revert before developing the namespace partitioning feature. On Mon, Aug 19, 2019 at 2:55 PM Wei-Chiu Chuang wrote: > For this week, > We will have Konstantin and the LinkedIn folks to discuss a recent project > that's been baking for quite a while. This is an exciting project as

Re: [DISCUSS] move storage community online sync schedule

2019-08-20 Thread Wei-Chiu Chuang
> +1 for 10 pm. > > > > BR, > > - Sid > > > > On Mon, Aug 19, 2019, 8:36 AM Wei-Chiu Chuang > > > wrote: > > > >> I received some feedbacks that the bi-weekly storage online sync that > >> happens Wednesday 9AM US pacific time (GMT-8) is t

Re: Hadoop Community Sync Up Schedule

2019-08-20 Thread Wei-Chiu Chuang
+1 On Mon, Aug 19, 2019 at 8:32 PM Wangda Tan wrote: > Hi folks, > > We have run community sync up for 1.5 months. I spoke to folks offline and > got some feedback. Here's a summary of what I've observed from sync ups and > talked to organizers. > > Following sync ups have very good

Re: Is shortcircuit-read (SCR) really fast?

2019-08-30 Thread Wei-Chiu Chuang
Interesting benchmark. Thank you, Daegyu. Can you try a larger file too? Like 128mb or 1gb? HDFS is not optimized for smaller files. What did you use for benchmark? Daegyu Han 於 2019年8月29日 週四,下午11:40寫道: > Hi all, > > Is ShortCircuit read faster than legacy read which goes through data nodes? >

Re: Is shortcircuit-read (SCR) really fast?

2019-09-04 Thread Wei-Chiu Chuang
; > Intuitively, I think SCR that client directly read file should be faster > than legacy read. > > However, the first step which is requesting file is synchronous and can be > overhead when using fast nvme ssd. > > > What do you think? > > > Thank you > > &

Hadoop Storage Online Sync Notes 8/5/2019

2019-08-07 Thread Wei-Chiu Chuang
Very happy to have CR from Uber leading today's discussion. Here's today's sync meeting notes. https://docs.google.com/document/d/1jXM5Ujvf-zhcyw_5kiQVx6g-HeKe-YGnFS_1-qFXomI/edit 8/5/2019 CR Hota (Uber) gave an update on Router Based Federation Attendee: Cloudera (Weichiu, Adam), Uber (CR Hota)

Topics for Hadoop storage online sync

2019-08-05 Thread Wei-Chiu Chuang
Hello! For this week's community online sync (English, Wednesday 9am US Pacific Time), we will have CR Hota from Uber to talk about the latest update in Router Based Federation. He will touch upon the following topics: 1. Security (Development and zookeeper scale testing learnings) 2. Isolation

Please file a JIRA for your PR

2019-08-08 Thread Wei-Chiu Chuang
The Hadoop community welcome your patch contribution, and like increasingly patches are submitted via GitHub Pull Requests. That is great, as it reduces the friction to review code & commit code. However, please make sure to file a jira for your PR, as described in the How to Contribute

[DISCUSS] EOL 2.8 or another 2.8.x release?

2019-07-24 Thread Wei-Chiu Chuang
The last 2.8 release (2.8.4) was made in the last May, more than a year ago. https://hadoop.apache.org/old/releases.html How do folks feel about the fate of branch-2.8? During the last community meetup in June, it sounds like most users are still on 2.8 or even 2.7, so I don't think we want to

Re: [DISCUSS] EOL 2.8 or another 2.8.x release?

2019-07-25 Thread Wei-Chiu Chuang
My bad -- Didn't realize I was looking at the old Hadoop page. Here's the correct list of releases. https://hadoop.apache.org/releases.html On Thu, Jul 25, 2019 at 12:49 AM 张铎(Duo Zhang) wrote: > IIRC we have a 2.8.5 release? > > On the download page: > > 2.8.5 2018 Sep 15 >

Re: Understanding the relationship between block size and RPC / IPC length?

2019-11-08 Thread Wei-Chiu Chuang
There are more details in this jira: https://issues.apache.org/jira/browse/HADOOP-16452 Denser DataNodes are common. It is not uncommon to find a DataNode with > 7 > million blocks these days. > With such a high number of blocks, the block report message can exceed the > 64mb limit (defined by

Fwd: This week's Hadoop storage community online sync

2019-10-28 Thread Wei-Chiu Chuang
, Weichiu -- Forwarded message - From: Wei-Chiu Chuang Date: Mon, Oct 28, 2019 at 7:41 PM Subject: This week's Hadoop storage community online sync To: Hdfs-dev , Hadoop Common < common-...@hadoop.apache.org> Hello, I am super stoked to have Yiqun Lin with us this Wednesday m

Hadoop meetup at Yahoo this Tuesday evening

2019-10-28 Thread Wei-Chiu Chuang
Hi, I don't think this meetup information is shared in the user mailing list, so here it is: https://www.meetup.com/hadoop/events/265963792 Join us at Yahoo’s HQ for awesome presentations (Uber, eBay, Cloudera, Yahoo/Verizon Media), conversations, & networking! Pizza & refreshments will be

Re: Hadoop and OpenSSL 1.1.1

2019-10-09 Thread Wei-Chiu Chuang
See https://issues.apache.org/jira/browse/HADOOP-14597 OpenSSL 1.1.0 is supported with Hadoop 3. We should backport this in Hadoop 2. I don't recall if we ever documented openssl version supported. Would be nice to add that too. On Wed, Oct 9, 2019 at 12:09 AM Gonzalo Gomez wrote: > Hi, any

Re: Hadoop and OpenSSL 1.1.1

2019-10-09 Thread Wei-Chiu Chuang
already EOL'd and we need to look into support 1.1.1 (another 4 years until EOL) https://www.openssl.org/policies/releasestrat.html On Wed, Oct 9, 2019 at 8:55 AM Wei-Chiu Chuang wrote: > See https://issues.apache.org/jira/browse/HADOOP-14597 > OpenSSL 1.1.0 is supported with Hadoop 3

Re: Hadoop and OpenSSL 1.1.1

2019-10-09 Thread Wei-Chiu Chuang
Filed HADOOP-16647 <https://issues.apache.org/jira/browse/HADOOP-16647> I am not planning to work on this any time soon so if any one is interested feel free to pick it up/supply additional information. On Wed, Oct 9, 2019 at 9:19 AM Wei-Chiu Chuang wrote: > Ok I stand

Re: [DISCUSS] fate of branch-2.9

2020-03-04 Thread Wei-Chiu Chuang
On Mon, Mar 2, 2020 at 5:12 PM, Wei-Chiu Chuang > wrote: Hi, > > Following the discussion to end branch-2.8, I want to start a discussion > around what's next with branch-2.9. I am hesitant to use the word "end of > life" but consider these facts: > > * 2.9.0 was re

[ANNOUNCE] Creation of user-zh mailing list

2020-02-28 Thread Wei-Chiu Chuang
che.org. Non-subscribers may also post messages after the moderators' approvals. - Wei-Chiu Chuang (on behalf of the Apache Hadoop PMC)

[通知] 建立 user-zh 郵件列表

2020-02-28 Thread Wei-Chiu Chuang
修改仍應以英文在 *-dev@, JIRAs and GitHub上進行。 這個郵件列表目前已經可以使用了,我們的網站也將再更新後加入此郵件列表。任何人都可藉由發信至 user-zh-subscr...@hadoop.apache.org 訂閱此列表。非訂閱者的信件經審核後可發出。 - 莊偉赳(Apache Hadoop PMC代表) On Fri, Feb 28, 2020 at 9:30 AM Wei-Chiu Chuang wrote: > Hi! > > Apache Hadoop welcomes contributors from around

Re: How do I validate Data Encryption on Block data transfer?

2020-02-05 Thread Wei-Chiu Chuang
I don't know the answer to the question off the top of my head. Tracking the source code, it looks like the data transfer encryption does not really depend on Kerberos. That said, (1) the Hadoop data transfer encryption relies on the data encryption key distributed by the NameNode. If a client

Re: [DISCUSS] fate of branch-2.9

2020-03-06 Thread Wei-Chiu Chuang
ommunity that does not have the resources to manage > multiple release lines, > you guys sure like to multiply release lines a lot. > > Cheers > Rupert > > Am Mi., 4. März 2020 um 19:40 Uhr schrieb Wei-Chiu Chuang > : > >> Forwarding the discussion thread from the dev maili

Re: [DISCUSS] fate of branch-2.9

2020-03-06 Thread Wei-Chiu Chuang
manage >> multiple release lines, >> you guys sure like to multiply release lines a lot. >> >> Cheers >> Rupert >> >> Am Mi., 4. März 2020 um 19:40 Uhr schrieb Wei-Chiu Chuang >> : >> >>> Forwarding the discussion thread from the dev

Re: Hadoop Storage community call

2020-04-01 Thread Wei-Chiu Chuang
Reminder -- this call is happening in about 2 hours. Stay safe! Weichiu On Tue, Mar 24, 2020 at 5:44 PM Wei-Chiu Chuang wrote: > Hi! > > For the bi-weekly Hadoop Storage community next week, we'll do something > different this time: > > Gabor Bota is going to talk about &quo

Re: How to identify active namenode?

2020-05-11 Thread Wei-Chiu Chuang
You can also check the namenode status through the namenode web UI/JMX. On Sat, May 2, 2020 at 1:27 AM Ayush Saxena wrote: > Hi, > Can you check : > > https://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html#Administrative_commands > > You can use

[ANNOUNCEMENT] Apache Hadoop 2.8.x release line end of life

2020-03-09 Thread Wei-Chiu Chuang
to newer release lines: 2.10.0 / 3.1.3 / 3.2.1. Please check out our Release EOL wiki for details: https://cwiki.apache.org/confluence/display/HADOOP/EOL+(End-of-life)+Release+Branches Best Regards, Wei-Chiu Chuang (On Behalf of the Apache Hadoop PMC)

Fwd: Hadoop Storage community call

2020-03-24 Thread Wei-Chiu Chuang
Forwarding this to the user mailing list since the topic may be more interesting for users. -- Forwarded message - From: Wei-Chiu Chuang Date: Tue, Mar 24, 2020 at 5:44 PM Subject: Hadoop Storage community call To: Hadoop Common , Hdfs-dev < hdfs-...@hadoop.apache.org> Cc:

Hadoop & ApacheCon

2020-09-01 Thread Wei-Chiu Chuang
Hello, This year's ApacheCon will take place online between 9/29 and 10/1. There are lots of sessions made by our fellow Hadoop developers: https://apachecon.com/acah2020/tracks/bigdata-1.html https://apachecon.com/acah2020/tracks/bigdata-2.html In case you didn't realize, the registration is

[ANNOUNCEMENT] Apache Hadoop 2.9.x release line end of life

2020-09-07 Thread Wei-Chiu Chuang
to newer release lines: 2.10.0 / 3.1.4 / 3.2.1 / 3.3.0. Please check out our Release EOL wiki for details: https://cwiki.apache.org/confluence/display/HADOOP/EOL+(End-of-life)+Release+Branches Best Regards, Wei-Chiu Chuang (On Behalf of the Apache Hadoop PMC)

Increased DN heap usage during Hadoop 3 upgrade

2020-10-05 Thread Wei-Chiu Chuang
I have anecdotally learned of multiple data points where during the upgrading from Hadoop 2 to Hadoop 3, DN heap usage increases to the point where it goes OOM. Don't have much logs for this issue, but I suspect it's caused by the layout change added in Hadoop 2.8.0. Does anyone else observe the

Re: Increased DN heap usage during Hadoop 3 upgrade

2020-10-05 Thread Wei-Chiu Chuang
o you use? > > On Mon, Oct 5, 2020 at 10:03 AM Wei-Chiu Chuang > wrote: > >> I have anecdotally learned of multiple data points where during the >> upgrading from Hadoop 2 to Hadoop 3, DN heap usage increases to the point >> where it goes OOM. >> >> Don't have

Re: [E] Re: Increased DN heap usage during Hadoop 3 upgrade

2020-10-06 Thread Wei-Chiu Chuang
ts? > > Kihwal > > On Mon, Oct 5, 2020 at 1:40 PM Wei-Chiu Chuang > wrote: > >> We experienced this issue on CDH6 and HDP3, so roughly Hadoop 3.0.x and >> 3.1.x. >> Hermanth experienced the same issue on Hadoop 3.1.1 as well (H

Re: [DISCUSS] fate of branch-2.9

2020-08-26 Thread Wei-Chiu Chuang
, Weichiu On Fri, Mar 6, 2020 at 5:47 PM Wei-Chiu Chuang wrote: > I think that's a great suggestion. > Currently, we make 1 minor release per year, and within each minor release > we bring up 1 thousand to 2 thousand commits in it compared with the > previous one. > I can tot

Re: Hadoop monitoring using Prometheus

2020-06-02 Thread Wei-Chiu Chuang
Check out HADOOP-16398 It's a new feature in Hadoop 3.3.0 Akira might be able to help. On Tue, Jun 2, 2020 at 5:56 PM ravi kanth wrote: > Hi Everyone, > > We have a production-ready cluster with 35 nodes that we are currently > using. We are

Re: dfs.namenode.replication.min can set by client while reading/writing hdfs files

2020-07-17 Thread Wei-Chiu Chuang
It’s a system wide setting. Yes it is configurable. No it is general a bad idea to change it for anything other than 1. Hadoop has not been properly tested with this value set to 2 or above. We really should update the description of this config and say “no, you really don’t want to change it”

Re: HDFS upgrade skip versions?

2020-12-15 Thread Wei-Chiu Chuang
Probably one of the protobuf incompatibility. Unfortunately we don't have an open source tool to detect protobuf incompat. A few related issues: 1. HDFS-15700 2. 1. HDFS-14726

[ANNOUNCE] Apache Hadoop 3.3.1 release

2021-06-15 Thread Wei-Chiu Chuang
olks who continued helps for this release process. Best Regards, Wei-Chiu Chuang

Re: PySpark Write File Container exited with a non-zero exit code 143

2021-05-19 Thread Wei-Chiu Chuang
Have you checked the executor log? In most cases the executor fails like that because of insufficient memory. You should be able to see more details looking at the executor log. On Thu, May 20, 2021 at 3:28 AM Clay McDonald < stuart.mcdon...@bateswhite.com> wrote: > Hello all, > > > > I’m hoping

Three Hadoop talks in this year's ApacheCon Asia

2021-07-12 Thread Wei-Chiu Chuang
For your information, While drafting the upcoming quarterly report, I found there are three talks that are directly related to Hadoop in this year's ApacheCon Asia. https://apachecon.com/acasia2021/tracks/bigdata.html Bigtop 3.0: Rerising community driven Hadoop distribution

Re: Any comment on the log4j issue?

2021-12-17 Thread Wei-Chiu Chuang
I filed a jira HADOOP-18050 and posted a PR to document our stance on the log4jshell vulnerability. Please review. On Fri, Dec 17, 2021 at 5:59 PM Brahma Reddy Battula wrote: > > > CVE-2021-44228 states that, it will affect the Apache Log4j2

Apache Hadoop and CVE-2021-44228 Log4JShell vulnerability

2021-12-19 Thread Wei-Chiu Chuang
Hi, Given the widespread attention to the recent log4j vulnerability (CVE-2021-44228), I'd like to share an update from the Hadoop developer community regarding the incident. As you probably know, Apache Hadoop depends on the log4j library to keep log files. The highlighted vulnerability

Next Mandarin Hadoop Online Meetup Jan 6th.

2022-01-03 Thread Wei-Chiu Chuang
Hello community, This week we'll going to have Tao Li (tomscut) speaking about the experience of operating HDFS at BIGO. See you on Thursday! 题目:《HDFS在BIGO的实践》

Re: Next Mandarin Hadoop Online Meetup Jan 6th.

2022-01-05 Thread Wei-Chiu Chuang
Just a gentle reminder this is happening now. On Mon, Jan 3, 2022 at 5:39 PM Wei-Chiu Chuang wrote: > Hello community, > > This week we'll going to have Tao Li (tomscut) speaking about the > experience of operating HDFS at BIGO. See you on Thursday! > > 题目:《HDFS在BIGO的实践》 >

Re: Next Mandarin Hadoop Online Meetup Jan 6th.

2022-01-09 Thread Wei-Chiu Chuang
/JaNm70lZQGCZdlFzh9ZbsfrR7MJ7Nazb2g6NCtYPqsRLWtyEhLfgwXOppzMR3csp.HqRJNGXUGSaPu1qw Access Passcode: 4g1ZF&%f On Mon, Jan 3, 2022 at 5:39 PM Wei-Chiu Chuang wrote: > Hello community, > > This week we'll going to have Tao Li (tomscut) speaking about the > experience of operating HDFS at BIGO. See you on Thursday! > > 题目:《HDFS在BIG

Re: Quick check on Log4j/Reload4j plan

2022-03-01 Thread Wei-Chiu Chuang
On Wed, Mar 2, 2022 at 2:43 AM Brent wrote: > Hey all, > > I've been trying to go through Jira issues and mailing list archives to > understand ongoing plans for Log4j 1.x upgrades. I know technically Hadoop > is not listed as vulnerable, but some more cautious organizations are > looking to

Re: [ANNOUNCE] Apache Hadoop 3.3.2 release

2022-03-03 Thread Wei-Chiu Chuang
> > Many thanks to Viraj Jasani, Michael Stack, Masatake Iwasaki, Xiaoqiao He, > Mukund Madhav Thakur, Wei-Chiu Chuang, Steve Loughran, Akira Ajisaka and > other folks who helped for this release process. > > Best Regards, > Chao >

Re: Quick check on Log4j/Reload4j plan

2022-03-04 Thread Wei-Chiu Chuang
That would be great! Would you like to start another thread to kick off the 2.10.x release plan? On Thu, Mar 3, 2022 at 9:39 PM Masatake Iwasaki wrote: > Hi Wei-Chiu Chuang, > > > I think a bigger question is whether or not we have someone who would > like to volunteer to be a

Fwd: Join us at the Storage User Group Meetup!

2023-10-17 Thread Wei-Chiu Chuang
-- Forwarded message - From: Wei-Chiu Chuang Date: Mon, Oct 16, 2023 at 11:28 AM Subject: Join us at the Storage User Group Meetup! To: Hdfs-dev Hi Please join us at the Storage Meetup at Cloudera's office next Wednesday. https://www.meetup.com/futureofdata-sanfrancisco/events

Re: Deploy multi-node Hadoop with Docker

2023-09-22 Thread Wei-Chiu Chuang
The Hadoop's docker image is not for production use. That's why But we should update that if people are thinking to use it for production. Not familiar with docker compose but contributions welcomed: https://github.com/apache/hadoop/blob/docker-hadoop-3/docker-compose.yaml On Fri, Sep 22, 2023

Re: Compare hadoop and ytsaurus

2023-09-28 Thread Wei-Chiu Chuang
Hey Kirill, Thanks for sharing! I wasn't aware of this project. According to the blog post https://medium.com/yandex/ytsaurus-exabyte-scale-storage-and-processing-system-is-now-open-source-42e7f5fa5fc6 It was released in public earlier this year by Yandex. It was inspired by Google's MapReduce,

Re: Namenode Connection Refused

2023-10-24 Thread Wei-Chiu Chuang
If it's an HA cluster, is it possible the client doesn't have the proper HA configuration so it doesn't know what host name to connect to? Otherwise, the usual suspect is the firewall configuration between the client and the NameNode. On Tue, Oct 24, 2023 at 9:05 AM Harry Jamison wrote: > I

Re: Questions regarding setting AWS application load balancer for YARN RM

2022-06-15 Thread Wei-Chiu Chuang
Not familiar with AWS but this warning can be work arounded by upping the IPC default length limit: See if you can update core-site.xml, change the property ipc.maximum.response.length which has the default of 128MB (=128*1024*1024) to something bigger, such as 256MB. On Thu, Jun 16, 2022 at

Re: Hdfs namenode consume much memory that are not expected

2022-08-01 Thread Wei-Chiu Chuang
Hi Not familiar with pmap, but G1GC is not recommended for such a big heap. To troubleshoot further, I usually run jmp -histo to get a list of top objects that use the most memory heap. On Mon, Aug 1, 2022 at 3:08 AM Micro dong wrote: > > we deploy hdfs in our company. we meet a unormal

Re: CVE-2022-42889

2022-10-27 Thread Wei-Chiu Chuang
1. HADOOP-18497 On Thu, Oct 27, 2022 at 4:45 AM Deepti Sharma S wrote: > Hello Team, > > > > As we have received the vulnerability “CVE-2022-42889”. We are using > Apache Hadoop common 3pp version 3.3.3 which has transitive dependency of

Re: Performance with large no of files

2022-10-10 Thread Wei-Chiu Chuang
Do you have security enabled? We did some preliminary benchmarks around webhdfs (i really want to revisit it again) and with security enabled, a lot of overhead is between client and KDC (SPENGO). Try run webhdfs using delegation tokens should help remove that bottleneck. On Sat, Oct 8, 2022 at

Re: Monitoring HDFS filesystem changes

2023-02-15 Thread Wei-Chiu Chuang
Use the inotify api https://dev-listener.medium.com/watch-for-changes-in-hdfs-800c6fb5481f https://github.com/onefoursix/hdfs-inotify-example/blob/master/src/main/java/com/onefoursix/HdfsINotifyExample.java On Wed, Feb 15, 2023 at 1:12 AM wrote: > Hello, > is there an efficient way to

[ANNOUNCE] Apache Hadoop 3.3.6 release

2023-06-26 Thread Wei-Chiu Chuang
On behalf of the Apache Hadoop Project Management Committee, I am pleased to announce the release of Apache Hadoop 3.3.6. It contains 117 bug fixes, improvements and enhancements since 3.3.5. Users of Apache Hadoop 3.3.5 and earlier should upgrade to this release.

  1   2   >